RE: [amibroker] Re: Freakishly fast backtest using 64 cores, AmiBroker Email List Archive

Which video cards provide this feature? As far as I can tell, it's only the
8-Series (G8X) GPU from NVIDIA, found in the GeForce, Quadro and Tesla
lines. Who will have one of these? Are many people likely to have them in
the future?

Thanks

----- Original Message -----
From: "dloyer123" <dloyer123@xxxxxxcom>
To: <amibroker@xxxxxxxxxps.com>
Sent: Wednesday, August 06, 2008 12:22 AM
Subject: [amibroker] Re: Freakishly fast backtest using 64 cores

> Very good question. That was a head scratcher.
>
> So the thing is, AmiBroker does a lot more work in a optimization
> pass then execute AFL code. In fact, the AFL code may take very
> little of the total run time.
>
> As an example, using a database with good amount of data, write a afl
> file that does nothing buy set the buy/sell/short/cover arrays to 0.
> The backtest will still take a good bit of time.
>
> So, even reducing the AFL run time to zero is not enough. It will
> not help much at all.
>
> So, to avoid this, I pass a "mode" variable to my Dll. This mode is
> set by a simple optimization statement:
>
> mode = optimize("mode",0,1,3,1);
>
> When mode = 0, the dll will evaluate one symbol like a normal dll.
> So if I click on a bar, it will update my printf statements, etc.
> buy/sell/short/cover arrays are set. A single backtest (not
> optimize) will use the normal AmiBroker trade match and evaluate code
> and generate stats as normal.
>
> When mode = 1, this means load the data. The Dll will copy the price
> data to a stage area in memory. buy/sell/short/cover are set to 0 to
> generate no trades. Having AmiBroker align the symbol bars was a big
> help here.
>
> When mode = 2, on the first symbol and the first symbol only, it
> loads the price data to the video card and executes as many backtest
> passes as it needs at a few ms per pass. Once the best combination
> is found it returns. buy/sell/short/cover are set to 0. Note that I
> can not use the Amibroker signal match and fitness function code. I
> have to provide my own. This is where the performance advantage of
> all of the extra cores come into play. It may run hundreds or
> thousands of parameter combinations very quickly. I cant use the
> built in optimize suppport, but brute force is enough for now. After
> all, I get 200 combinations per second.
>
> When mode = 3, each symbol evaluates using the best parms found on
> the last mode=2 run. buy/sell/short/cover are set. In a walkforward
> test, this will always have the best score and be used for the
> walkforward step. A custom backtest function adds the chosen
> parameters to the backtest report. Mode 3 works like mode 0 except
> it uses the optimal parameters rather than defualt values.
>
> The action("status") and action("statusex") codes could also be used,
> but they did not tell me quite what I needed to know. Also, I could
> have avoided the mode=2 step if I could find a way to know I was on
> the last symbol and run the optimization then. I guess I could pass
> the name of the last symbol.
>
> So I use AmiBroker to load and keep the datbase, visualize the
> trades, validate, walkforward and provide deep metrics of the
> backtest.
>
> If I wanted to take this further, I would move the trade system logic
> out of the Dll and make it programable from Afl. That way it could
> be used by anyone without needing to program C. I would do this by
> passing handles to cuda arrays through the Afl code.
>
>
>
>
> --- In amibroker@xxxxxxxxxps.com, "Paul Ho" <paul.tsho@x..> wrote:
>>
>> thanks for your insight.
>> I hope you dont mind sharing a little bit more detail
>> You said "
>> Get get the best performance, my AFL code makes one pass over the
>> > data, calling a Dll. The Dll takes all of the data needed by the
>> > calculation and loads a copy to the video card. This upload is
>> slow,
>> > the entire upload takes about 45 seconds for all 1000 symbols.
>> >
>> > Once all of the data is uploaded, the Dll loads a "kernel" into
> the
>> > graphics cores that perform the actual computation and generates
>> the
>> > trade list.
>>
>> normally AB loads the data from database as needed, and calls a
>> function in a dll, and passes data in arrays or whatever as
> arguments
>> of the function. The function will be called for every ticker in
> the
>> watchlist, and data pertaining that symbol is passed each time. I
>> wonder how you do a "single pass" over the data. Because AB passes
>> the data as part of the argument regardless of how many
> optimizations
>> It had previously with the same data. I just wonder you do it.
>> cheers
>> Paul.
>>
>> --- In amibroker@xxxxxxxxxps.com, "dloyer123" <dloyer123@> wrote:
>> >
>> > This uses the mid range video card that happened to come with my
>> > system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
>> > faster. The 260 can be found at best buy for $300. Some laptops
>> > have compatible cards as well.
>> >
>> > The video card has its own memory, mine has 512MB, some have as
>> much
>> > as 1GB. This memory is very fast, once it is loaded from the
> main
>> > system. Nvidia has a professional line of products that have
> much
>> > more memory.
>> >
>> > Get get the best performance, my AFL code makes one pass over the
>> > data, calling a Dll. The Dll takes all of the data needed by the
>> > calculation and loads a copy to the video card. This upload is
>> slow,
>> > the entire upload takes about 45 seconds for all 1000 symbols.
>> >
>> > Once all of the data is uploaded, the Dll loads a "kernel" into
> the
>> > graphics cores that perform the actual computation and generates
>> the
>> > trade list. This part is very fast and performs all of the same
>> > functions that my AFL version does. The resulting trade list is
>> the
>> > same.
>> >
>> > Because the data loaded into video memory, it can be resused for
>> many
>> > passes over the data with different optimization values. So,
>> > hundreds of combinations of optimization values can be tried per
>> > second.
>> >
>> > For non optimization runs, the Dll just loads one symbol into
> video
>> > memory and processes it. Counting the overhead of moving data to
>> the
>> > video card and extracting the trade list for a single symbol, the
>> > result is similar to AFL code alone. This lets me test the code
>> and
>> > make sure it is correct.
>> >
>> > This approach works best when the data only needs to be loaded
>> once,
>> > then "resused" many times. It also works best when there is a
> lot
>> of
>> > data to work with.
>> >
>> > What is more interesting to me and what would be more useful for
>> > others would be a general drive that requires no Dll changes to
>> > modify the system. The performance would not be as good as hand
>> > optimized code, but would still be much better than AFL code
>> alone.
>> > It would take trading system design to a whole new level. It
> would
>> > provide enough performance to make working with Intra day data as
>> > easy as daily data is today.
>> >
>> > Writing such a driver would be hard, but I have already done some
>> > prototypes and design work. I am tempted to do it for my own
> use.
>> > If I made it available to others supporting it would be a PITA.
>> >
>> >
>> >
>> >
>> > --- In amibroker@xxxxxxxxxps.com, "Paul Ho" <paul.tsho@> wrote:
>> > >
>> > > I'm very interested
>> > > could you elaborate a bit more
>> > > What model of Nvidia chipset are you using, and with how much
>> > memory?
>> > > Not sure exactly what you mean when you say
>> > > It uses AmiBroker to load the symbol data and perform
>> calculations
>> > > that do not depend on the optimization parameters. Once loaded
>> into
>> > > video memory, repeated passes can be made with different
>> > parameters,
>> > > avoiding any overhead.
>> > > Can you give me some examples. I presume when your dll is
> called.
>> > AB passes
>> > > one or more arrays of data belonging to 1 symbol, is that true?
>> > > Not sure exactly what the rest mean either. How many functions
>> are
>> > you
>> > > running in your dll, and what does each of the do?
>> > > Great of you to share your insight.
>> > > Cheers
>> > > Paul.
>> > >
>> > >
>> > >
>> > > _____
>> > >
>> > > From: amibroker@xxxxxxxxxps.com
>> [mailto:amibroker@xxxxxxxxxps.com]
>> > On Behalf
>> > > Of dloyer123
>> > > Sent: Tuesday, 5 August 2008 9:19 AM
>> > > To: amibroker@xxxxxxxxxps.com
>> > > Subject: [amibroker] Freakishly fast backtest using 64 cores
>> > >
>> > >
>> > >
>> > > Greetings,
>> > >
>> > > I ported part of my AFL backtest code to a plugin, that takes
>> > > advantage of the graphics math cores on the video card that are
>> > > normally used for 3d graphics.
>> > >
>> > > I was able to get a several thousand fold performance
> improvement
>> > > over AFL code alone.
>> > >
>> > > My goal was to reduce the 25 seconds AFL code alone uses for a
>> > single
>> > > portfolio level back test to less than 1 second, allowing multi
>> day
>> > > optimization and walkforward runs to complete in a more
>> reasonable
>> > > time, and also just to see how fast I could get it to run.
>> > >
>> > > The backtest runs over 1 year of 5 minute bars for about 1000
>> > > symbols. 1 year of data normally takes 25 seconds for AmiBroker
>> > > alone, or 18 seconds for 6 months of data. A typical
> optimization
>> > > run takes hundreds of these passes per walk forward step,
> taking
>> > > hours.
>> > >
>> > > Using the Nvidia CUDA API, running on my mid range video card.
> It
>> > > was much faster. Much, much, much faster. How fast?
>> > >
>> > > It reduced the run time from 25s to... 4.4ms. That is more than
>> > > 200/s!
>> > >
>> > > I didnt believe the timing when I saw it at first. So, I put
>> 1,000
>> > > runs in a loop and sure enough, it ran 1,000 iterations in
> about
>> 4
>> > > 1/2 seconds. This far exceeded my gaol or expectations.
>> > >
>> > > The resulting trade list matches that obtained by the AFL
> version
>> > of
>> > > this code.
>> > >
>> > > I estimate that it is processing 32GB of bar data/sec.
>> > >
>> > > Getting this to work at peak performance was tricky. Most of
> what
>> I
>> > > have learned about code optimization does not apply.
>> > >
>> > > It uses AmiBroker to load the symbol data and perform
>> calculations
>> > > that do not depend on the optimization parameters. Once loaded
>> into
>> > > video memory, repeated passes can be made with different
>> > parameters,
>> > > avoiding any overhead.
>> > >
>> > > For non backtest/optimization runs, the code just evaluates one
>> > > symbol and passes the data back to AmiBroker
> buy/sell/short/cover
>> > > arrays, making it easy to test, validate and visualize the
>> trades.
>> > > There is very little performance gain in this case.
>> > >
>> > > There are problems, however. To run optimizations at peak
> speed,
>> I
>> > > can not use AmiBroker to calculate the optimization goal
>> function.
>> > > So, I am in the process of writing code to match signals and
>> > > calculate the portfolio fitness function. Once I do this, I
> will
>> be
>> > > able to perform full optimizations and walk forwards at 3
> orders
>> of
>> > > magnitude faster than is possible with AmiBroker alone.
>> > >
>> > > Also, this is not general purpose code. Changing the system
> code
>> > > means changing a dll written in C. However, there is no reason
>> that
>> > > this could not be made more general.
>> > >
>> > > I have made some prototypes of "Cuda" versions of basic AFL
>> > > functions. The idea is to queue the function calls into a
>> > definition
>> > > executed by a micro kernel running on the graphics cores. The
>> > result
>> > > would be the ability to use the full power of the graphics
> cores
>> by
>> > > modifying AFL code to use Cuda aware versions with no changes
> to
>> C
>> > > code. It would be an interesting, but big project.
>> > >
>> >
>>
>
>
>
> ------------------------------------
>
> Please note that this group is for discussion between users only.
>
> To get support from AmiBroker please send an e-mail directly to
> SUPPORT {at} amibroker.com
>
> For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> http://www.amibroker.com/devlog/
>
> For other support material please check also:
> http://www.amibroker.com/support.html
> Yahoo! Groups Links
>
>
>