PureBytes Links
Trading Reference Links
|
From: Tony <acjhopst@xxxxxxxxx>
Date: Sat, 15 May 2004 09:43:53 +0200
> 2) Speed - all these tools get data from yahoo or somewhere else via http
> requests. 7000 http requests, no matter how you cut it takes a lot of time.
> If you get just a file per exchange, you can get your download done in <1
> minute vs. something potentially a lot more. Now only hope eoddata.com
> doesn't go away 8 )
>
>Because HTTP is a stateless protocol, you can massively parallize the
>fetches and gain an order of magnitude in speed.
Sven I fully agree with you on that but there are few things to say about it.
Setting up the the transfer takes a fair amout of time.
For ease of explanation lets assume 1 sec to setup and 1 sec to transfer 1 price
If you downloaded history worth of 7000 days that would take 1+7000=7001 seconds
Abhijit downloads 1 day with 7000 prices. Same speed. (but more manual actions)
Ah, no.
With a little perl, I was easily downloading 10 in parallel, thus 701
seconds for your example. Checking the bandwidth later, I could have
easily have been downloading 100 in parallel or 71 seconds. The
manual actions were writing the original code over a couple of hours
(very first time), and fixing it three times due to webpage redesign.
Afterwards the "cron" automated clock tool did all the work. If you
really want to collect information off of webpages often and fast,
read "Web Client Programming".
Lots of time to fix up splits, etc too.
The huge advantage, imo, of Abhijit's system is that he can put the data in GS.
I was feeding into GS via flatfiles and HT. Perl is a symbolic
manipulation language, HTML is a symbolic markup language, so perl has
little problem parsing webpages for info and feeding into other
programs.
|