[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

> The programing guide lists the 8600M and 8700M as having 32 
>computing 
> cores.  Not sure what they are clocked at.

core clock/shader clock/memory clock Mhz 
8600M GT 475/950/700
8700M GT 625/1250/800

There's always the 9800 GTX with 112 cores.

brian_z

--- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" <dloyer123@xxx> wrote:
>
> The programing guide lists the 8600M and 8700M as having 32 
computing 
> cores.  Not sure what they are clocked at.  Power is an issue.  The 
> desktop versions need dedicated power connectors.  The big cards 
need 
> two.
> 
> Actually, when I am doing development on my laptop, I just use the 
> emulator.  It is about 100x slower than my desktop system, but 
still 
> about 20x to 50x faster than Ami alone.  The speed difference in 
> emulation mode is mostly due to the precomputed and cached price 
> arrays.
> 
> Tomasz:  I suspect that there is an opportunity to trade memory for 
> speed, even with 1 core.  Memory is cheap and would be a simpler 
way 
> to get a performance boost than porting to multi core, GPU or CPU. 
> 
> 
> 
> --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@> 
> wrote:
> >
> > Dell has 3 off the shelf 
> > > laptops in their entertainment/performance range that use 
GeForce 
> > > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for 
CUDA?)
> > 
> > Mobile ones are very poor cousins. Belive me. I own brand new 
> notebook (ASUS) with GeForce8600M
> > and it is SLOW in 3D. I mean SLOW. Did I mention that it is SLOW? 
> > 
> > In 3D Mark it gets the same results as my 3 year old desktop 
6600GT.
> > 
> > Best regards,
> > Tomasz Janeczko
> > amibroker.com
> > ----- Original Message ----- 
> > From: "brian_z111" <brian_z111@>
> > To: <amibroker@xxxxxxxxxxxxxxx>
> > Sent: Tuesday, August 12, 2008 12:40 AM
> > Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> > 
> > 
> > > DL
> > > 
> > > 
> > > I am following at the top level and understand what you are 
doing 
> OK 
> > > (you make me wish I had learnt programming/IT).
> > > 
> > > I like your CPU.
> > > 
> > > Allowing niche trading is what AB is all about?
> > > 
> > > I'll put my money on MS/"general purpose computing on GPU" - I 
> don't 
> > > think the masses are in love with MS but for 80% of people who 
> can do 
> > > 80% of what they want with MS the price to move elsewhere is 
too 
> > > high - they are just in love with max output for min input.
> > > 
> > > If you go to the trouble to write a plug-in do you think it 
will 
> be 
> > > around long/require much ongoing support from you?
> > > 
> > > I can see the benefits of the speed - for a group of traders it 
> is a 
> > > definite edge they would have for a year or two (I don't think 
> any 
> > > other trading software will be seeing this for a while? - 
> especially 
> > > in the AT area where more crunching could be done fast enough 
to 
> keep 
> > > up with live data.
> > > 
> > > I don't blame Tomasz for not sitting his backside on the 
cutting 
> > > edge - too dangerous for developers with long term clientele.
> > > 
> > > Not having a go at Tomasz - to clarify - Tomeasz said GEForce 
> 8800 
> > > can't be put in a notebook?
> > > 
> > > To my understanding there seems to be a reasonable number of 
> laptops 
> > > around that could use your method e.g. Dell has 3 off the shelf 
> > > laptops in their entertainment/performance range that use 
GeForce 
> > > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for 
CUDA?)
> > > 
> > > I looked at the GeF links in Paul's post but they didn't have 
> much 
> > > specific info there that I could see - I assume the above cards 
> wiil 
> > > run your system.
> > > 
> > > I am not a buyer for now but good luck with it and what you 
have 
> done 
> > > already is a good contribution to AB - once someone on the 
block 
> has 
> > > a new super-dooper gadget pretty soon the neighbours want one 
too 
> and 
> > > demand grows.
> > > 
> > > brian_z
> > > 
> > > 
> > > 
> > > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" <dloyer123@> 
wrote:
> > >>
> > >> This uses the mid range video card that happened to come with 
my 
> > >> system, a 9800GT.  The newer 260 and 280 cards are 3 to 4 
times 
> > >> faster.  The 260 can be found at best buy for $300.  Some 
> laptops 
> > >> have compatible cards as well. 
> > >> 
> > >> The video card has its own memory, mine has 512MB, some have 
as 
> > > much 
> > >> as 1GB.  This memory is very fast, once it is loaded from the 
> main 
> > >> system.  Nvidia has a professional line of products that have 
> much 
> > >> more memory.  
> > >> 
> > >> Get get the best performance, my AFL code makes one pass over 
> the 
> > >> data, calling a Dll.  The Dll takes all of the data needed by 
> the 
> > >> calculation and loads a copy to the video card.  This upload 
is 
> > > slow, 
> > >> the entire upload takes about 45 seconds for all 1000 symbols. 
> > >> 
> > >> Once all of the data is uploaded, the Dll loads a "kernel" 
into 
> the 
> > >> graphics cores that perform the actual computation and 
generates 
> > > the 
> > >> trade list.  This part is very fast and performs all of the 
same 
> > >> functions that my AFL version does.  The resulting trade list 
is 
> > > the 
> > >> same.  
> > >> 
> > >> Because the data loaded into video memory, it can be resused 
for 
> > > many 
> > >> passes over the data with different optimization values.  So, 
> > >> hundreds of combinations of optimization values can be tried 
per 
> > >> second.  
> > >> 
> > >> For non optimization runs, the Dll just loads one symbol into 
> video 
> > >> memory and processes it.  Counting the overhead of moving data 
> to 
> > > the 
> > >> video card and extracting the trade list for a single symbol, 
> the 
> > >> result is similar to AFL code alone.  This lets me test the 
code 
> > > and 
> > >> make sure it is correct.
> > >> 
> > >> This approach works best when the data only needs to be loaded 
> > > once, 
> > >> then "resused" many times.  It also works best when there is a 
> lot 
> > > of 
> > >> data to work with. 
> > >> 
> > >> What is more interesting to me and what would be more useful 
for 
> > >> others would be a general drive that requires no Dll changes 
to 
> > >> modify the system.  The performance would not be as good as 
hand 
> > >> optimized code, but would still be much better than AFL code 
> > > alone.  
> > >> It would take trading system design to a whole new level.  It 
> would 
> > >> provide enough performance to make working with Intra day data 
> as 
> > >> easy as daily data is today.
> > >> 
> > >> Writing such a driver would be hard, but I have already done 
> some 
> > >> prototypes and design work.  I am tempted to do it for my own 
> use.  
> > >> If I made it available to others supporting it would be a 
PITA.  
> > >> 
> > >> 
> > >> 
> > >> 
> > >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@> wrote:
> > >> >
> > >> > I'm very interested
> > >> > could you elaborate a bit more
> > >> > What model of Nvidia chipset are you using, and with how 
much 
> > >> memory?
> > >> > Not sure exactly what you mean when you say
> > >> > It uses AmiBroker to load the symbol data and perform 
> > > calculations 
> > >> > that do not depend on the optimization parameters. Once 
loaded 
> > > into 
> > >> > video memory, repeated passes can be made with different 
> > >> parameters, 
> > >> > avoiding any overhead. 
> > >> > Can you give me some examples. I presume when your dll is 
> called. 
> > >> AB passes
> > >> > one or more arrays of data belonging to 1 symbol, is that 
true?
> > >> > Not sure exactly what the rest mean either. How many 
functions 
> > > are 
> > >> you
> > >> > running in your dll, and what does each of the do?
> > >> > Great of you to share your insight.
> > >> > Cheers
> > >> > Paul.
> > >> >  
> > >> > 
> > >> > 
> > >> >   _____  
> > >> > 
> > >> > From: amibroker@xxxxxxxxxxxxxxx 
> > > [mailto:amibroker@xxxxxxxxxxxxxxx] 
> > >> On Behalf
> > >> > Of dloyer123
> > >> > Sent: Tuesday, 5 August 2008 9:19 AM
> > >> > To: amibroker@xxxxxxxxxxxxxxx
> > >> > Subject: [amibroker] Freakishly fast backtest using 64 cores
> > >> > 
> > >> > 
> > >> > 
> > >> > Greetings,
> > >> > 
> > >> > I ported part of my AFL backtest code to a plugin, that 
takes 
> > >> > advantage of the graphics math cores on the video card that 
> are 
> > >> > normally used for 3d graphics. 
> > >> > 
> > >> > I was able to get a several thousand fold performance 
> improvement 
> > >> > over AFL code alone.
> > >> > 
> > >> > My goal was to reduce the 25 seconds AFL code alone uses for 
a 
> > >> single 
> > >> > portfolio level back test to less than 1 second, allowing 
> multi 
> > > day 
> > >> > optimization and walkforward runs to complete in a more 
> > > reasonable 
> > >> > time, and also just to see how fast I could get it to run.
> > >> > 
> > >> > The backtest runs over 1 year of 5 minute bars for about 
1000 
> > >> > symbols. 1 year of data normally takes 25 seconds for 
> AmiBroker 
> > >> > alone, or 18 seconds for 6 months of data. A typical 
> optimization 
> > >> > run takes hundreds of these passes per walk forward step, 
> taking 
> > >> > hours.
> > >> > 
> > >> > Using the Nvidia CUDA API, running on my mid range video 
card. 
> It 
> > >> > was much faster. Much, much, much faster. How fast?
> > >> > 
> > >> > It reduced the run time from 25s to... 4.4ms. That is more 
> than 
> > >> > 200/s! 
> > >> > 
> > >> > I didnt believe the timing when I saw it at first. So, I put 
> > > 1,000 
> > >> > runs in a loop and sure enough, it ran 1,000 iterations in 
> about 
> > > 4 
> > >> > 1/2 seconds. This far exceeded my gaol or expectations.
> > >> > 
> > >> > The resulting trade list matches that obtained by the AFL 
> version 
> > >> of 
> > >> > this code. 
> > >> > 
> > >> > I estimate that it is processing 32GB of bar data/sec.
> > >> > 
> > >> > Getting this to work at peak performance was tricky. Most of 
> what 
> > > I 
> > >> > have learned about code optimization does not apply. 
> > >> > 
> > >> > It uses AmiBroker to load the symbol data and perform 
> > > calculations 
> > >> > that do not depend on the optimization parameters. Once 
loaded 
> > > into 
> > >> > video memory, repeated passes can be made with different 
> > >> parameters, 
> > >> > avoiding any overhead. 
> > >> > 
> > >> > For non backtest/optimization runs, the code just evaluates 
> one 
> > >> > symbol and passes the data back to AmiBroker 
> buy/sell/short/cover 
> > >> > arrays, making it easy to test, validate and visualize the 
> > > trades. 
> > >> > There is very little performance gain in this case. 
> > >> > 
> > >> > There are problems, however. To run optimizations at peak 
> speed, 
> > > I 
> > >> > can not use AmiBroker to calculate the optimization goal 
> > > function. 
> > >> > So, I am in the process of writing code to match signals and 
> > >> > calculate the portfolio fitness function. Once I do this, I 
> will 
> > > be 
> > >> > able to perform full optimizations and walk forwards at 3 
> orders 
> > > of 
> > >> > magnitude faster than is possible with AmiBroker alone.
> > >> > 
> > >> > Also, this is not general purpose code. Changing the system 
> code 
> > >> > means changing a dll written in C. However, there is no 
reason 
> > > that 
> > >> > this could not be made more general. 
> > >> > 
> > >> > I have made some prototypes of "Cuda" versions of basic AFL 
> > >> > functions. The idea is to queue the function calls into a 
> > >> definition 
> > >> > executed by a micro kernel running on the graphics cores. 
The 
> > >> result 
> > >> > would be the ability to use the full power of the graphics 
> cores 
> > > by 
> > >> > modifying AFL code to use Cuda aware versions with no 
changes 
> to 
> > > C 
> > >> > code. It would be an interesting, but big project.
> > >> >
> > >>
> > > 
> > > 
> > > 
> > > ------------------------------------
> > > 
> > > Please note that this group is for discussion between users 
only.
> > > 
> > > To get support from AmiBroker please send an e-mail directly to 
> > > SUPPORT {at} amibroker.com
> > > 
> > > For NEW RELEASE ANNOUNCEMENTS and other news always check 
DEVLOG:
> > > http://www.amibroker.com/devlog/
> > > 
> > > For other support material please check also:
> > > http://www.amibroker.com/support.html
> > > Yahoo! Groups Links
> > > 
> > > 
> > >
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/