[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Python 2.5 based Yahoo ticker downloader



PureBytes Links

Trading Reference Links

Hey, this is great! 

Since you got it up and running I'm going to quit trying to maintain  my code. Too many breaks in the code due to Yahoo changing their html every few months. And you are right, you have to double check a lot of entries as they have a lot of incorrect data, which is why I did opt to grab the company profile page. But when I cross checked it with other open sources (like Google finance), came up with still yet other errors. 

Anyway, good luck and keep up the good work. Glad to see someone else picked up the task!

Thanks!
Jim


--- In amibroker@xxxxxxxxxxxxxxx, "tpowers2010" <wingusr@xxx> wrote:
>
> No. While I tried his stuff I noticed that the US Stocks database seemed to be out of date. Since I knew this kind of thing is "simple" using Python I decided to write my own downloader from scratch. 
> 
> Of course the devil is in the details and I still have a few issues to sort out.
> 
> --- In amibroker@xxxxxxxxxxxxxxx, "areehoi" <areehoi@> wrote:
> >
> > This is great news.  Have you been in contact with Jim Swindle.  I know he has been looking for someone to take over the updating of the US-Stocks database. He has provided a great service so hopefully you can do the same as you've solved the main ingredient. Let us hear from you on progress on the project.  Thanks for your interest and help.
> > 
> > Dick H
> > 
> > --- In amibroker@xxxxxxxxxxxxxxx, "tpowers2010" <wingusr@> wrote:
> > >
> > > I'm currently working on a Python 2.5 script to download all the stocks
> > > listed in the Yahoo Industry Browser <http://biz.yahoo.com/p/>   by
> > > sector then industry.
> > > 
> > > I basically do the same thing that is done by the Excel workbook found
> > > at http://icc-az.com/amibroker_files%5CStocks_XLS.zip
> > > <http://icc-az.com/amibroker_files%5CStocks_XLS.zip> . However, that
> > > page says "Since this using plain VBA for all extraction, it is very
> > > slow. Expect 12 hours to do an extract...".
> > > 
> > > For comparison, my Python script currently takes about 8 minutes or so.
> > > The main reason is that I can get ticker, company name, sector, and
> > > industry without having to download the individual company profile
> > > pages. And, unlike the Excel solution which downloads entire webpages
> > > (including images), I only have to grab the basic html page.
> > > 
> > > Using the Python 3rd party BeautifulSoup module
> > > <http://www.crummy.com/software/BeautifulSoup/> , it turns out it's
> > > pretty easy to extract the required information from the raw html
> > > (rather than making Excel convert webpages to spreadsheets).
> > > 
> > > Finally, to get the exchange information, instead of having to read each
> > > company's profile page I use the
> > > http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x
> > > <http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x>  URL with TICKERS
> > > replaced with a + separated list of ticker symbols to get the exchanges
> > > for 200 companies at once.
> > > 
> > > A caveat is that it turns out that getting info from the Industry
> > > Browser pages alone surprisingly yields ticker symbols that are already
> > > incorrect! (This seems to happen for any stock whose exchange is listed
> > > as "n/a". My impression is that the newer Yahoo 
> > > <http://biz.yahoo.com/ic/ind_index.html> Industry Center
> > > <http://biz.yahoo.com/ic/ind_index.html>  page is more accurate but
> > > slightly harder to parse.
> > > 
> > > Therefore to be absolutely sure that the tickers are valid, you end up
> > > having to make sure you can download each companies profile or quotes
> > > page. The only time I've tried doing that took about 3 hours. As a side
> > > benefit of this process you can scape additional information on each
> > > company (such as number of employees). Only about 10 or so of the 7500+
> > > symbols were listed incorrectly on the main Industry Browser pages (all
> > > of them being OTC BB traded stocks).
> > > 
> > > I'm thinking about using multiple threads to download say 10 pages at
> > > once to speed up this last process. Unfortunately, I didn't design the
> > > original code to be thread-safe so this will take some work.
> > > 
> > > Once I have the basic stock information I spit out a .csv list (readable
> > > by Excel), broker.sectors, and broker.industries files. I also use a
> > > separate small Python script to initialize a new AmiBroker database. You
> > > have to manually update the Markets since there is apparently no way to
> > > do this from COM (but there are only 8 of them).
> > > 
> > > One thing I noticed is that the brokers.industries file used to
> > > initialize new databases seems to have an undocumented limit of about 38
> > > or 39 characters for Industry Name? The "Textile - Apparel Footwear &
> > > Accessories" industry gets truncated and a bogus industry gets added
> > > unless I first limit the industry name length.
> > > 
> > > Also, Industries don't appear to be sorted correctly under their Sectors
> > > (I saw another post here that mentions the same thing).
> > > 
> > > Anyway, this is all somewhat of a work in progress. It also is a
> > > command-line only script. There is no GUI associated with it. You'll
> > > have to be comfortable with installing ActiveState's free python 2.5 for
> > > Windows distribution, installing the BeautifulSoup, and mechanize
> > > modules, and running scripts from a Command Prompt.
> > >
> >
>




------------------------------------

**** IMPORTANT PLEASE READ ****
This group is for the discussion between users only.
This is *NOT* technical support channel.

TO GET TECHNICAL SUPPORT send an e-mail directly to 
SUPPORT {at} amibroker.com

TO SUBMIT SUGGESTIONS please use FEEDBACK CENTER at
http://www.amibroker.com/feedback/
(submissions sent via other channels won't be considered)

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/