[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Database data quality



PureBytes Links

Trading Reference Links



How do you determine how "clean" your data is? Besides using the Database Purify command it seems to me an essential tool is the ability to compare one AmiBroker database to another and see what the differences are.

For example, it would be nice to know what the difference is between say PremiumData.com and Yahoo's free data. What is the difference between how they handle symbol changes and splits? What happens if you get PremiumData historical data but update it with Yahoo data?

And what's the difference between a database filled every day for a month with the AmiQuote Yahoo Current source (or the Historical source going back only a day), and a database that is filled all at once with the last 30 days of data. The second way would presumably pick up any fixes to the Yahoo / Commodity Systems data that had occurred, whereas the first technique wouldn't.

How significant is the difference and is it worth worrying about. Should Yahoo free data users periodically just reget their entire databases? Instead of relying on the quick Yahoo Current source maybe they should use the Historical Source and get a week's work of data every day to pick up any recent fixes.

But how can you tell what is worth doing without the ability to compare databases?

I know from working on my Yahoo Stock Ticker downloader (see http://finance.groups.yahoo.com/group/amibroker/message/140652 for more details on this work in progess) that an Industry Browser download as of July 24, 2009 had this breakdown on Sunday:
Yahoo Stock List Exchange Summary (8):
  538  AMEX
  453  NASDAQ CM
 1026  NASDAQ GM
 1289  NASDAQ NM
 2312  NYSE
 1898  OTC BB
    7  Other OTC
    4  PCX
 ----
 7527

Yahoo Stock List Sector Summary (9):
  815  Basic Materials
   29  Conglomerates
  524  Consumer Goods
 2166  Financial
  802  Healthcare
  443  Industrial Goods
 1309  Services
 1292  Technology
  147  Utilities
 ----
 7527
But now (July 28, 2009) says this:
Yahoo Stock List Exchange Summary (8):
  538  AMEX
  368  NASDAQ CM
  971  NASDAQ GM
 1427  NASDAQ NM
 2312  NYSE
 1894  OTC BB
    7  Other OTC
    4  PCX
 ----
 7521

Yahoo Stock List Sector Summary (9):
  815  Basic Materials
   29  Conglomerates
  524  Consumer Goods
 2165  Financial
  801  Healthcare
  441  Industrial Goods
 1308  Services
 1291  Technology
  147  Utilities
 ----
 7521
And the following symbols have died:
Bad symbols (7):
 Borland Software Corp. (N/A: BORL)
 China Networks International H (: CNWHF)
 Endocare Inc. (N/A: ENDO)
 Maven Media Holdings, Inc. (N/A: MVMH.OB)
 Neah Power Systems, Inc. (N/A: NPWS.OB)
 TransTech Services Partners In (N/A: TTSP.OB)
 Wataire International, Inc. (N/A: WTARE.OB)


So even a few days results in quite a few differences just in which stocks are listed. (Borland Software has disappeared??! Oh, merged with Micro Focus the other day)

The massive changes in the NASDAQ markets seems suspicious and maybe my Python script isn't quite working yet, but the exchange lookup is pretty hard to mess up.
Back to the original question: I could always implement something in COM to extract the symbol info back out to Python and do the comparisons there but I was wondering if there is another solution.


__._,_.___


**** IMPORTANT PLEASE READ ****
This group is for the discussion between users only.
This is *NOT* technical support channel.

TO GET TECHNICAL SUPPORT send an e-mail directly to
SUPPORT {at} amibroker.com

TO SUBMIT SUGGESTIONS please use FEEDBACK CENTER at
http://www.amibroker.com/feedback/
(submissions sent via other channels won't be considered)

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/





Your email settings: Individual Email|Traditional
Change settings via the Web (Yahoo! ID required)
Change settings via email: Switch delivery to Daily Digest | Switch to Fully Featured
Visit Your Group | Yahoo! Groups Terms of Use | Unsubscribe

__,_._,___