Which video cards provide this feature? As far as I can tell, it's only the 
  
8-Series (G8X) GPU from NVIDIA, found in the GeForce, Quadro and Tesla 
  
lines. Who will have one of these? Are many people likely to have them in 
  
the future?
Thanks
----- Original Message ----- 
From: 
  "dloyer123" <dloyer123@xxxxxxcom>
To: 
  <amibroker@xxxxxxxxxps.com>
Sent: 
  Wednesday, August 06, 2008 12:22 AM
Subject: [amibroker] Re: Freakishly 
  fast backtest using 64 cores
> Very good question. That was a head 
  scratcher.
>
> So the thing is, AmiBroker does a lot more work in 
  a optimization
> pass then execute AFL code. In fact, the AFL code may 
  take very
> little of the total run time.
>
> As an example, 
  using a database with good amount of data, write a afl
> file that does 
  nothing buy set the buy/sell/short/cover arrays to 0.
> The 
  backtest will still take a good bit of time.
>
> So, even reducing 
  the AFL run time to zero is not enough. It will
> not help much at 
  all.
>
> So, to avoid this, I pass a "mode" variable to my Dll. 
  This mode is
> set by a simple optimization statement:
>
> 
  mode = optimize("mode",0,1,3,1);
>
> When mode = 0, the 
  dll will evaluate one symbol like a normal dll.
> So if I click on a 
  bar, it will update my printf statements, etc.
> 
  buy/sell/short/cover arrays are set. A single backtest (not
> 
  optimize) will use the normal AmiBroker trade match and evaluate code
> 
  and generate stats as normal.
>
> When mode = 1, this means load 
  the data. The Dll will copy the price
> data to a stage area in memory. 
  buy/sell/short/cover are set to 0 to
> generate no trades. Having 
  AmiBroker align the symbol bars was a big
> help here.
>
> 
  When mode = 2, on the first symbol and the first symbol only, it
> loads 
  the price data to the video card and executes as many backtest
> passes 
  as it needs at a few ms per pass. Once the best combination
> is found 
  it returns. buy/sell/short/cover are set to 0. Note that I
> can 
  not use the Amibroker signal match and fitness function code. I
> have 
  to provide my own. This is where the performance advantage of
> all of 
  the extra cores come into play. It may run hundreds or
> thousands of 
  parameter combinations very quickly. I cant use the
> built in optimize 
  suppport, but brute force is enough for now. After
> all, I get 200 
  combinations per second.
>
> When mode = 3, each symbol evaluates 
  using the best parms found on
> the last mode=2 run. 
  buy/sell/short/cover are set. In a walkforward
> test, this will 
  always have the best score and be used for the
> walkforward step. A 
  custom backtest function adds the chosen
> parameters to the backtest 
  report. Mode 3 works like mode 0 except
> it uses the optimal parameters 
  rather than defualt values.
>
> The action("status") and 
  action("statusex") codes could also be used,
> but they did not 
  tell me quite what I needed to know. Also, I could
> have avoided the 
  mode=2 step if I could find a way to know I was on
> the last symbol and 
  run the optimization then. I guess I could pass
> the name of the last 
  symbol.
>
> So I use AmiBroker to load and keep the datbase, 
  visualize the
> trades, validate, walkforward and provide deep metrics 
  of the
> backtest.
>
> If I wanted to take this further, I 
  would move the trade system logic
> out of the Dll and make it 
  programable from Afl. That way it could
> be used by anyone without 
  needing to program C. I would do this by
> passing handles to cuda 
  arrays through the Afl code.
>
>
>
>
> --- In amibroker@xxxxxxxxxps.com, 
  "Paul Ho" <paul.tsho@x..> wrote:
>>
>> thanks for 
  your insight.
>> I hope you dont mind sharing a little bit more 
  detail
>> You said "
>> Get get the best performance, my AFL 
  code makes one pass over the
>> > data, calling a Dll. The Dll 
  takes all of the data needed by the
>> > calculation and loads a 
  copy to the video card. This upload is
>> slow,
>> > the 
  entire upload takes about 45 seconds for all 1000 symbols.
>> 
  >
>> > Once all of the data is uploaded, the Dll loads a 
  "kernel" into
> the
>> > graphics cores that perform the 
  actual computation and generates
>> the
>> > trade 
  list.
>>
>> normally AB loads the data from database as 
  needed, and calls a
>> function in a dll, and passes data in arrays 
  or whatever as
> arguments
>> of the function. The function 
  will be called for every ticker in
> the
>> watchlist, and data 
  pertaining that symbol is passed each time. I
>> wonder how you do a 
  "single pass" over the data. Because AB passes
>> the data as part of 
  the argument regardless of how many
> optimizations
>> It had 
  previously with the same data. I just wonder you do it.
>> 
  cheers
>> Paul.
>>
>> --- In amibroker@xxxxxxxxxps.com, 
  "dloyer123" <dloyer123@> wrote:
>> >
>> > This 
  uses the mid range video card that happened to come with my
>> > 
  system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
>> 
  > faster. The 260 can be found at best buy for $300. Some 
  laptops
>> > have compatible cards as well.
>> 
  >
>> > The video card has its own memory, mine has 512MB, some 
  have as
>> much
>> > as 1GB. This memory is very fast, 
  once it is loaded from the
> main
>> > system. Nvidia has a 
  professional line of products that have
> much
>> > more 
  memory.
>> >
>> > Get get the best performance, my AFL 
  code makes one pass over the
>> > data, calling a Dll. The Dll 
  takes all of the data needed by the
>> > calculation and loads a 
  copy to the video card. This upload is
>> slow,
>> > the 
  entire upload takes about 45 seconds for all 1000 symbols.
>> 
  >
>> > Once all of the data is uploaded, the Dll loads a 
  "kernel" into
> the
>> > graphics cores that perform the 
  actual computation and generates
>> the
>> > trade list. 
  This part is very fast and performs all of the same
>> > functions 
  that my AFL version does. The resulting trade list is
>> 
  the
>> > same.
>> >
>> > Because the data 
  loaded into video memory, it can be resused for
>> many
>> 
  > passes over the data with different optimization values. So,
>> 
  > hundreds of combinations of optimization values can be tried 
  per
>> > second.
>> >
>> > For non 
  optimization runs, the Dll just loads one symbol into
> 
  video
>> > memory and processes it. Counting the overhead of 
  moving data to
>> the
>> > video card and extracting the 
  trade list for a single symbol, the
>> > result is similar to AFL 
  code alone. This lets me test the code
>> and
>> > make 
  sure it is correct.
>> >
>> > This approach works best 
  when the data only needs to be loaded
>> once,
>> > then 
  "resused" many times. It also works best when there is a
> 
  lot
>> of
>> > data to work with.
>> 
  >
>> > What is more interesting to me and what would be more 
  useful for
>> > others would be a general drive that requires no 
  Dll changes to
>> > modify the system. The performance would not 
  be as good as hand
>> > optimized code, but would still be much 
  better than AFL code
>> alone.
>> > It would take trading 
  system design to a whole new level. It
> would
>> > provide 
  enough performance to make working with Intra day data as
>> > 
  easy as daily data is today.
>> >
>> > Writing such a 
  driver would be hard, but I have already done some
>> > prototypes 
  and design work. I am tempted to do it for my own
> use.
>> 
  > If I made it available to others supporting it would be a 
  PITA.
>> >
>> >
>> >
>> 
  >
>> > --- In amibroker@xxxxxxxxxps.com, 
  "Paul Ho" <paul.tsho@> wrote:
>> > >
>> > 
  > I'm very interested
>> > > could you elaborate a bit 
  more
>> > > What model of Nvidia chipset are you using, and 
  with how much
>> > memory?
>> > > Not sure exactly 
  what you mean when you say
>> > > It uses AmiBroker to load the 
  symbol data and perform
>> calculations
>> > > that do 
  not depend on the optimization parameters. Once loaded
>> 
  into
>> > > video memory, repeated passes can be made with 
  different
>> > parameters,
>> > > avoiding any 
  overhead.
>> > > Can you give me some examples. I presume when 
  your dll is
> called.
>> > AB passes
>> > > 
  one or more arrays of data belonging to 1 symbol, is that true?
>> 
  > > Not sure exactly what the rest mean either. How many 
  functions
>> are
>> > you
>> > > running 
  in your dll, and what does each of the do?
>> > > Great of you 
  to share your insight.
>> > > Cheers
>> > > 
  Paul.
>> > >
>> > >
>> > 
  >
>> > > _____
>> > >
>> > > 
  From: amibroker@xxxxxxxxxps.com
>> 
  [mailto:amibroker@xxxxxxxxxps.com]
>> 
  > On Behalf
>> > > Of dloyer123
>> > > Sent: 
  Tuesday, 5 August 2008 9:19 AM
>> > > To: amibroker@xxxxxxxxxps.com
>> 
  > > Subject: [amibroker] Freakishly fast backtest using 64 
  cores
>> > >
>> > >
>> > 
  >
>> > > Greetings,
>> > >
>> > 
  > I ported part of my AFL backtest code to a plugin, that takes
>> 
  > > advantage of the graphics math cores on the video card that 
  are
>> > > normally used for 3d graphics.
>> > 
  >
>> > > I was able to get a several thousand fold 
  performance
> improvement
>> > > over AFL code 
  alone.
>> > >
>> > > My goal was to reduce the 
  25 seconds AFL code alone uses for a
>> > single
>> > 
  > portfolio level back test to less than 1 second, allowing 
  multi
>> day
>> > > optimization and walkforward runs 
  to complete in a more
>> reasonable
>> > > time, and 
  also just to see how fast I could get it to run.
>> > 
  >
>> > > The backtest runs over 1 year of 5 minute bars for 
  about 1000
>> > > symbols. 1 year of data normally takes 25 
  seconds for AmiBroker
>> > > alone, or 18 seconds for 6 months 
  of data. A typical
> optimization
>> > > run takes 
  hundreds of these passes per walk forward step,
> taking
>> 
  > > hours.
>> > >
>> > > Using the Nvidia 
  CUDA API, running on my mid range video card.
> It
>> > > 
  was much faster. Much, much, much faster. How fast?
>> > 
  >
>> > > It reduced the run time from 25s to... 4.4ms. That 
  is more than
>> > > 200/s!
>> > >
>> 
  > > I didnt believe the timing when I saw it at first. So, I 
  put
>> 1,000
>> > > runs in a loop and sure enough, it 
  ran 1,000 iterations in
> about
>> 4
>> > > 1/2 
  seconds. This far exceeded my gaol or expectations.
>> > 
  >
>> > > The resulting trade list matches that obtained by 
  the AFL
> version
>> > of
>> > > this 
  code.
>> > >
>> > > I estimate that it is 
  processing 32GB of bar data/sec.
>> > >
>> > > 
  Getting this to work at peak performance was tricky. Most of
> 
  what
>> I
>> > > have learned about code optimization 
  does not apply.
>> > >
>> > > It uses AmiBroker 
  to load the symbol data and perform
>> calculations
>> > 
  > that do not depend on the optimization parameters. Once 
  loaded
>> into
>> > > video memory, repeated passes 
  can be made with different
>> > parameters,
>> > > 
  avoiding any overhead.
>> > >
>> > > For non 
  backtest/optimization runs, the code just evaluates one
>> > 
  > symbol and passes the data back to AmiBroker
> 
  buy/sell/short/cover
>> > > arrays, making it easy to 
  test, validate and visualize the
>> trades.
>> > > 
  There is very little performance gain in this case.
>> > 
  >
>> > > There are problems, however. To run optimizations 
  at peak
> speed,
>> I
>> > > can not use 
  AmiBroker to calculate the optimization goal
>> function.
>> 
  > > So, I am in the process of writing code to match signals 
  and
>> > > calculate the portfolio fitness function. Once I do 
  this, I
> will
>> be
>> > > able to perform full 
  optimizations and walk forwards at 3
> orders
>> of
>> 
  > > magnitude faster than is possible with AmiBroker alone.
>> 
  > >
>> > > Also, this is not general purpose code. 
  Changing the system
> code
>> > > means changing a dll 
  written in C. However, there is no reason
>> that
>> > 
  > this could not be made more general.
>> > >
>> 
  > > I have made some prototypes of "Cuda" versions of basic 
  AFL
>> > > functions. The idea is to queue the function calls 
  into a
>> > definition
>> > > executed by a micro 
  kernel running on the graphics cores. The
>> > result
>> 
  > > would be the ability to use the full power of the graphics
> 
  cores
>> by
>> > > modifying AFL code to use Cuda 
  aware versions with no changes
> to
>> C
>> > > 
  code. It would be an interesting, but big project.
>> > 
  >
>> >
>>
>
>
>
> 
  ------------------------------------
>
> Please 
  note that this group is for discussion between users only.
>
> To 
  get support from AmiBroker please send an e-mail directly to
> SUPPORT 
  {at} amibroker.com
>
> For NEW RELEASE ANNOUNCEMENTS and other 
  news always check DEVLOG:
> http://www.amibroker.com/devlog/
>
> 
  For other support material please check also:
> http://www.amibroker.com/support.html
> 
  Yahoo! Groups Links
>
>
>