[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

Click on the individual chipset, you'll get the manufacturers that 
are using those chipset.
--- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@xxx> wrote:
>
> http://www.nvidia.com/object/cuda_learn_products.html
> 
> 
>   _____  
> 
> From: amibroker@xxxxxxxxxxxxxxx [mailto:amibroker@xxxxxxxxxxxxxxx] 
On Behalf
> Of cstrader
> Sent: Wednesday, 6 August 2008 10:51 PM
> To: amibroker@xxxxxxxxxxxxxxx
> Subject: Re: [amibroker] Re: Freakishly fast backtest using 64 cores
> 
> 
> 
> Which video cards provide this feature? As far as I can tell, it's 
only the 
> 8-Series (G8X) GPU from NVIDIA, found in the GeForce, Quadro and 
Tesla 
> lines. Who will have one of these? Are many people likely to have 
them in 
> the future?
> 
> Thanks
> 
> ----- Original Message ----- 
> From: "dloyer123" <dloyer123@xxxxxx <mailto:dloyer123%40yahoo.com> 
com>
> To: <amibroker@xxxxxxxxx <mailto:amibroker%40yahoogroups.com> 
ps.com>
> Sent: Wednesday, August 06, 2008 12:22 AM
> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> 
> > Very good question. That was a head scratcher.
> >
> > So the thing is, AmiBroker does a lot more work in a optimization
> > pass then execute AFL code. In fact, the AFL code may take very
> > little of the total run time.
> >
> > As an example, using a database with good amount of data, write a 
afl
> > file that does nothing buy set the buy/sell/short/cover arrays to 
0.
> > The backtest will still take a good bit of time.
> >
> > So, even reducing the AFL run time to zero is not enough. It will
> > not help much at all.
> >
> > So, to avoid this, I pass a "mode" variable to my Dll. This mode 
is
> > set by a simple optimization statement:
> >
> > mode = optimize("mode",0,1,3,1);
> >
> > When mode = 0, the dll will evaluate one symbol like a normal dll.
> > So if I click on a bar, it will update my printf statements, etc.
> > buy/sell/short/cover arrays are set. A single backtest (not
> > optimize) will use the normal AmiBroker trade match and evaluate 
code
> > and generate stats as normal.
> >
> > When mode = 1, this means load the data. The Dll will copy the 
price
> > data to a stage area in memory. buy/sell/short/cover are set to 0 
to
> > generate no trades. Having AmiBroker align the symbol bars was a 
big
> > help here.
> >
> > When mode = 2, on the first symbol and the first symbol only, it
> > loads the price data to the video card and executes as many 
backtest
> > passes as it needs at a few ms per pass. Once the best combination
> > is found it returns. buy/sell/short/cover are set to 0. Note that 
I
> > can not use the Amibroker signal match and fitness function code. 
I
> > have to provide my own. This is where the performance advantage of
> > all of the extra cores come into play. It may run hundreds or
> > thousands of parameter combinations very quickly. I cant use the
> > built in optimize suppport, but brute force is enough for now. 
After
> > all, I get 200 combinations per second.
> >
> > When mode = 3, each symbol evaluates using the best parms found on
> > the last mode=2 run. buy/sell/short/cover are set. In a 
walkforward
> > test, this will always have the best score and be used for the
> > walkforward step. A custom backtest function adds the chosen
> > parameters to the backtest report. Mode 3 works like mode 0 except
> > it uses the optimal parameters rather than defualt values.
> >
> > The action("status") and action("statusex") codes could also be 
used,
> > but they did not tell me quite what I needed to know. Also, I 
could
> > have avoided the mode=2 step if I could find a way to know I was 
on
> > the last symbol and run the optimization then. I guess I could 
pass
> > the name of the last symbol.
> >
> > So I use AmiBroker to load and keep the datbase, visualize the
> > trades, validate, walkforward and provide deep metrics of the
> > backtest.
> >
> > If I wanted to take this further, I would move the trade system 
logic
> > out of the Dll and make it programable from Afl. That way it could
> > be used by anyone without needing to program C. I would do this by
> > passing handles to cuda arrays through the Afl code.
> >
> >
> >
> >
> > --- In amibroker@xxxxxxxxx <mailto:amibroker%40yahoogroups.com> 
ps.com,
> "Paul Ho" <paul.tsho@> wrote:
> >>
> >> thanks for your insight.
> >> I hope you dont mind sharing a little bit more detail
> >> You said "
> >> Get get the best performance, my AFL code makes one pass over the
> >> > data, calling a Dll. The Dll takes all of the data needed by 
the
> >> > calculation and loads a copy to the video card. This upload is
> >> slow,
> >> > the entire upload takes about 45 seconds for all 1000 symbols.
> >> >
> >> > Once all of the data is uploaded, the Dll loads a "kernel" into
> > the
> >> > graphics cores that perform the actual computation and 
generates
> >> the
> >> > trade list.
> >>
> >> normally AB loads the data from database as needed, and calls a
> >> function in a dll, and passes data in arrays or whatever as
> > arguments
> >> of the function. The function will be called for every ticker in
> > the
> >> watchlist, and data pertaining that symbol is passed each time. I
> >> wonder how you do a "single pass" over the data. Because AB 
passes
> >> the data as part of the argument regardless of how many
> > optimizations
> >> It had previously with the same data. I just wonder you do it.
> >> cheers
> >> Paul.
> >>
> >> --- In amibroker@xxxxxxxxx <mailto:amibroker%40yahoogroups.com> 
ps.com,
> "dloyer123" <dloyer123@> wrote:
> >> >
> >> > This uses the mid range video card that happened to come with 
my
> >> > system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
> >> > faster. The 260 can be found at best buy for $300. Some laptops
> >> > have compatible cards as well.
> >> >
> >> > The video card has its own memory, mine has 512MB, some have as
> >> much
> >> > as 1GB. This memory is very fast, once it is loaded from the
> > main
> >> > system. Nvidia has a professional line of products that have
> > much
> >> > more memory.
> >> >
> >> > Get get the best performance, my AFL code makes one pass over 
the
> >> > data, calling a Dll. The Dll takes all of the data needed by 
the
> >> > calculation and loads a copy to the video card. This upload is
> >> slow,
> >> > the entire upload takes about 45 seconds for all 1000 symbols.
> >> >
> >> > Once all of the data is uploaded, the Dll loads a "kernel" into
> > the
> >> > graphics cores that perform the actual computation and 
generates
> >> the
> >> > trade list. This part is very fast and performs all of the same
> >> > functions that my AFL version does. The resulting trade list is
> >> the
> >> > same.
> >> >
> >> > Because the data loaded into video memory, it can be resused 
for
> >> many
> >> > passes over the data with different optimization values. So,
> >> > hundreds of combinations of optimization values can be tried 
per
> >> > second.
> >> >
> >> > For non optimization runs, the Dll just loads one symbol into
> > video
> >> > memory and processes it. Counting the overhead of moving data 
to
> >> the
> >> > video card and extracting the trade list for a single symbol, 
the
> >> > result is similar to AFL code alone. This lets me test the code
> >> and
> >> > make sure it is correct.
> >> >
> >> > This approach works best when the data only needs to be loaded
> >> once,
> >> > then "resused" many times. It also works best when there is a
> > lot
> >> of
> >> > data to work with.
> >> >
> >> > What is more interesting to me and what would be more useful 
for
> >> > others would be a general drive that requires no Dll changes to
> >> > modify the system. The performance would not be as good as hand
> >> > optimized code, but would still be much better than AFL code
> >> alone.
> >> > It would take trading system design to a whole new level. It
> > would
> >> > provide enough performance to make working with Intra day data 
as
> >> > easy as daily data is today.
> >> >
> >> > Writing such a driver would be hard, but I have already done 
some
> >> > prototypes and design work. I am tempted to do it for my own
> > use.
> >> > If I made it available to others supporting it would be a PITA.
> >> >
> >> >
> >> >
> >> >
> >> > --- In amibroker@xxxxxxxxx <mailto:amibroker%
40yahoogroups.com> ps.com,
> "Paul Ho" <paul.tsho@> wrote:
> >> > >
> >> > > I'm very interested
> >> > > could you elaborate a bit more
> >> > > What model of Nvidia chipset are you using, and with how much
> >> > memory?
> >> > > Not sure exactly what you mean when you say
> >> > > It uses AmiBroker to load the symbol data and perform
> >> calculations
> >> > > that do not depend on the optimization parameters. Once 
loaded
> >> into
> >> > > video memory, repeated passes can be made with different
> >> > parameters,
> >> > > avoiding any overhead.
> >> > > Can you give me some examples. I presume when your dll is
> > called.
> >> > AB passes
> >> > > one or more arrays of data belonging to 1 symbol, is that 
true?
> >> > > Not sure exactly what the rest mean either. How many 
functions
> >> are
> >> > you
> >> > > running in your dll, and what does each of the do?
> >> > > Great of you to share your insight.
> >> > > Cheers
> >> > > Paul.
> >> > >
> >> > >
> >> > >
> >> > > _____
> >> > >
> >> > > From: amibroker@xxxxxxxxx <mailto:amibroker%
40yahoogroups.com> ps.com
> >> [mailto:amibroker@xxxxxxxxx <mailto:amibroker%40yahoogroups.com> 
ps.com]
> >> > On Behalf
> >> > > Of dloyer123
> >> > > Sent: Tuesday, 5 August 2008 9:19 AM
> >> > > To: amibroker@xxxxxxxxx <mailto:amibroker%40yahoogroups.com> 
ps.com
> >> > > Subject: [amibroker] Freakishly fast backtest using 64 cores
> >> > >
> >> > >
> >> > >
> >> > > Greetings,
> >> > >
> >> > > I ported part of my AFL backtest code to a plugin, that takes
> >> > > advantage of the graphics math cores on the video card that 
are
> >> > > normally used for 3d graphics.
> >> > >
> >> > > I was able to get a several thousand fold performance
> > improvement
> >> > > over AFL code alone.
> >> > >
> >> > > My goal was to reduce the 25 seconds AFL code alone uses for 
a
> >> > single
> >> > > portfolio level back test to less than 1 second, allowing 
multi
> >> day
> >> > > optimization and walkforward runs to complete in a more
> >> reasonable
> >> > > time, and also just to see how fast I could get it to run.
> >> > >
> >> > > The backtest runs over 1 year of 5 minute bars for about 1000
> >> > > symbols. 1 year of data normally takes 25 seconds for 
AmiBroker
> >> > > alone, or 18 seconds for 6 months of data. A typical
> > optimization
> >> > > run takes hundreds of these passes per walk forward step,
> > taking
> >> > > hours.
> >> > >
> >> > > Using the Nvidia CUDA API, running on my mid range video 
card.
> > It
> >> > > was much faster. Much, much, much faster. How fast?
> >> > >
> >> > > It reduced the run time from 25s to... 4.4ms. That is more 
than
> >> > > 200/s!
> >> > >
> >> > > I didnt believe the timing when I saw it at first. So, I put
> >> 1,000
> >> > > runs in a loop and sure enough, it ran 1,000 iterations in
> > about
> >> 4
> >> > > 1/2 seconds. This far exceeded my gaol or expectations.
> >> > >
> >> > > The resulting trade list matches that obtained by the AFL
> > version
> >> > of
> >> > > this code.
> >> > >
> >> > > I estimate that it is processing 32GB of bar data/sec.
> >> > >
> >> > > Getting this to work at peak performance was tricky. Most of
> > what
> >> I
> >> > > have learned about code optimization does not apply.
> >> > >
> >> > > It uses AmiBroker to load the symbol data and perform
> >> calculations
> >> > > that do not depend on the optimization parameters. Once 
loaded
> >> into
> >> > > video memory, repeated passes can be made with different
> >> > parameters,
> >> > > avoiding any overhead.
> >> > >
> >> > > For non backtest/optimization runs, the code just evaluates 
one
> >> > > symbol and passes the data back to AmiBroker
> > buy/sell/short/cover
> >> > > arrays, making it easy to test, validate and visualize the
> >> trades.
> >> > > There is very little performance gain in this case.
> >> > >
> >> > > There are problems, however. To run optimizations at peak
> > speed,
> >> I
> >> > > can not use AmiBroker to calculate the optimization goal
> >> function.
> >> > > So, I am in the process of writing code to match signals and
> >> > > calculate the portfolio fitness function. Once I do this, I
> > will
> >> be
> >> > > able to perform full optimizations and walk forwards at 3
> > orders
> >> of
> >> > > magnitude faster than is possible with AmiBroker alone.
> >> > >
> >> > > Also, this is not general purpose code. Changing the system
> > code
> >> > > means changing a dll written in C. However, there is no 
reason
> >> that
> >> > > this could not be made more general.
> >> > >
> >> > > I have made some prototypes of "Cuda" versions of basic AFL
> >> > > functions. The idea is to queue the function calls into a
> >> > definition
> >> > > executed by a micro kernel running on the graphics cores. The
> >> > result
> >> > > would be the ability to use the full power of the graphics
> > cores
> >> by
> >> > > modifying AFL code to use Cuda aware versions with no changes
> > to
> >> C
> >> > > code. It would be an interesting, but big project.
> >> > >
> >> >
> >>
> >
> >
> >
> > ------------------------------------
> >
> > Please note that this group is for discussion between users only.
> >
> > To get support from AmiBroker please send an e-mail directly to
> > SUPPORT {at} amibroker.com
> >
> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> > http://www.amibroke <http://www.amibroker.com/devlog/> 
r.com/devlog/
> >
> > For other support material please check also:
> > http://www.amibroke <http://www.amibroker.com/support.html>
> r.com/support.html
> > Yahoo! Groups Links
> >
> >
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/