[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

This uses the mid range video card that happened to come with my 
system, a 9800GT.  The newer 260 and 280 cards are 3 to 4 times 
faster.  The 260 can be found at best buy for $300.  Some laptops 
have compatible cards as well. 

The video card has its own memory, mine has 512MB, some have as much 
as 1GB.  This memory is very fast, once it is loaded from the main 
system.  Nvidia has a professional line of products that have much 
more memory.  

Get get the best performance, my AFL code makes one pass over the 
data, calling a Dll.  The Dll takes all of the data needed by the 
calculation and loads a copy to the video card.  This upload is slow, 
the entire upload takes about 45 seconds for all 1000 symbols. 

Once all of the data is uploaded, the Dll loads a "kernel" into the 
graphics cores that perform the actual computation and generates the 
trade list.  This part is very fast and performs all of the same 
functions that my AFL version does.  The resulting trade list is the 
same.  

Because the data loaded into video memory, it can be resused for many 
passes over the data with different optimization values.  So, 
hundreds of combinations of optimization values can be tried per 
second.  

For non optimization runs, the Dll just loads one symbol into video 
memory and processes it.  Counting the overhead of moving data to the 
video card and extracting the trade list for a single symbol, the 
result is similar to AFL code alone.  This lets me test the code and 
make sure it is correct.

This approach works best when the data only needs to be loaded once, 
then "resused" many times.  It also works best when there is a lot of 
data to work with. 

What is more interesting to me and what would be more useful for 
others would be a general drive that requires no Dll changes to 
modify the system.  The performance would not be as good as hand 
optimized code, but would still be much better than AFL code alone.  
It would take trading system design to a whole new level.  It would 
provide enough performance to make working with Intra day data as 
easy as daily data is today.

Writing such a driver would be hard, but I have already done some 
prototypes and design work.  I am tempted to do it for my own use.  
If I made it available to others supporting it would be a PITA.  




--- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@xxx> wrote:
>
> I'm very interested
> could you elaborate a bit more
> What model of Nvidia chipset are you using, and with how much 
memory?
> Not sure exactly what you mean when you say
> It uses AmiBroker to load the symbol data and perform calculations 
> that do not depend on the optimization parameters. Once loaded into 
> video memory, repeated passes can be made with different 
parameters, 
> avoiding any overhead. 
> Can you give me some examples. I presume when your dll is called. 
AB passes
> one or more arrays of data belonging to 1 symbol, is that true?
> Not sure exactly what the rest mean either. How many functions are 
you
> running in your dll, and what does each of the do?
> Great of you to share your insight.
> Cheers
> Paul.
>  
> 
> 
>   _____  
> 
> From: amibroker@xxxxxxxxxxxxxxx [mailto:amibroker@xxxxxxxxxxxxxxx] 
On Behalf
> Of dloyer123
> Sent: Tuesday, 5 August 2008 9:19 AM
> To: amibroker@xxxxxxxxxxxxxxx
> Subject: [amibroker] Freakishly fast backtest using 64 cores
> 
> 
> 
> Greetings,
> 
> I ported part of my AFL backtest code to a plugin, that takes 
> advantage of the graphics math cores on the video card that are 
> normally used for 3d graphics. 
> 
> I was able to get a several thousand fold performance improvement 
> over AFL code alone.
> 
> My goal was to reduce the 25 seconds AFL code alone uses for a 
single 
> portfolio level back test to less than 1 second, allowing multi day 
> optimization and walkforward runs to complete in a more reasonable 
> time, and also just to see how fast I could get it to run.
> 
> The backtest runs over 1 year of 5 minute bars for about 1000 
> symbols. 1 year of data normally takes 25 seconds for AmiBroker 
> alone, or 18 seconds for 6 months of data. A typical optimization 
> run takes hundreds of these passes per walk forward step, taking 
> hours.
> 
> Using the Nvidia CUDA API, running on my mid range video card. It 
> was much faster. Much, much, much faster. How fast?
> 
> It reduced the run time from 25s to... 4.4ms. That is more than 
> 200/s! 
> 
> I didnt believe the timing when I saw it at first. So, I put 1,000 
> runs in a loop and sure enough, it ran 1,000 iterations in about 4 
> 1/2 seconds. This far exceeded my gaol or expectations.
> 
> The resulting trade list matches that obtained by the AFL version 
of 
> this code. 
> 
> I estimate that it is processing 32GB of bar data/sec.
> 
> Getting this to work at peak performance was tricky. Most of what I 
> have learned about code optimization does not apply. 
> 
> It uses AmiBroker to load the symbol data and perform calculations 
> that do not depend on the optimization parameters. Once loaded into 
> video memory, repeated passes can be made with different 
parameters, 
> avoiding any overhead. 
> 
> For non backtest/optimization runs, the code just evaluates one 
> symbol and passes the data back to AmiBroker buy/sell/short/cover 
> arrays, making it easy to test, validate and visualize the trades. 
> There is very little performance gain in this case. 
> 
> There are problems, however. To run optimizations at peak speed, I 
> can not use AmiBroker to calculate the optimization goal function. 
> So, I am in the process of writing code to match signals and 
> calculate the portfolio fitness function. Once I do this, I will be 
> able to perform full optimizations and walk forwards at 3 orders of 
> magnitude faster than is possible with AmiBroker alone.
> 
> Also, this is not general purpose code. Changing the system code 
> means changing a dll written in C. However, there is no reason that 
> this could not be made more general. 
> 
> I have made some prototypes of "Cuda" versions of basic AFL 
> functions. The idea is to queue the function calls into a 
definition 
> executed by a micro kernel running on the graphics cores. The 
result 
> would be the ability to use the full power of the graphics cores by 
> modifying AFL code to use Cuda aware versions with no changes to C 
> code. It would be an interesting, but big project.
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/