PureBytes Links
Trading Reference Links
|
I'm currently working on a Python 2.5 script to download all the stocks listed in the Yahoo Industry Browser by sector then industry.
I basically do the same thing that is done by the Excel workbook found at http://icc-az.com/amibroker_files%5CStocks_XLS.zip. However, that page says "Since this using plain VBA for all extraction, it is very slow. Expect 12 hours to do an extract...".
For comparison, my Python script currently takes about 8 minutes or so. The main reason is that I can get ticker, company name, sector, and industry without having to download the individual company profile pages. And, unlike the Excel solution which downloads entire webpages (including images), I only have to grab the basic html page.
Using the Python 3rd party BeautifulSoup module, it turns out it's pretty easy to extract the required information from the raw html (rather than making Excel convert webpages to spreadsheets).
Finally, to get the exchange information, instead of having to read each company's profile page I use the http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x URL with TICKERS replaced with a + separated list of ticker symbols to get the exchanges for 200 companies at once.
A caveat is that it turns out that getting info from the Industry Browser pages alone surprisingly yields ticker symbols that are already incorrect! (This seems to happen for any stock whose exchange is listed as "n/a". My impression is that the newer Yahoo Industry Center page is more accurate but slightly harder to parse.
Therefore to be absolutely sure that the tickers are valid, you end up having to make sure you can download each companies profile or quotes page. The only time I've tried doing that took about 3 hours. As a side benefit of this process you can scape additional information on each company (such as number of employees). Only about 10 or so of the 7500+ symbols were listed incorrectly on the main Industry Browser pages (all of them being OTC BB traded stocks).
I'm thinking about using multiple threads to download say 10 pages at once to speed up this last process. Unfortunately, I didn't design the original code to be thread-safe so this will take some work.
Once I have the basic stock information I spit out a .csv list (readable by Excel), broker.sectors, and broker.industries files. I also use a separate small Python script to initialize a new AmiBroker database. You have to manually update the Markets since there is apparently no way to do this from COM (but there are only 8 of them).
One thing I noticed is that the brokers.industries file used to initialize new databases seems to have an undocumented limit of about 38 or 39 characters for Industry Name? The "Textile - Apparel Footwear & Accessories" industry gets truncated and a bogus industry gets added unless I first limit the industry name length.
Also, Industries don't appear to be sorted correctly under their Sectors (I saw another post here that mentions the same thing).
Anyway, this is all somewhat of a work in progress. It also is a command-line only script. There is no GUI associated with it. You'll have to be comfortable with installing ActiveState's free python 2.5 for Windows distribution, installing the BeautifulSoup, and mechanize modules, and running scripts from a Command Prompt.
__._,_.___
**** IMPORTANT PLEASE READ ****
This group is for the discussion between users only.
This is *NOT* technical support channel.
TO GET TECHNICAL SUPPORT send an e-mail directly to
SUPPORT {at} amibroker.com
TO SUBMIT SUGGESTIONS please use FEEDBACK CENTER at
http://www.amibroker.com/feedback/
(submissions sent via other channels won't be considered)
For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/
__,_._,___
|