[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

Sorry forgot the link
http://biz.yahoo.com/prnews/080916/aqtu046.html?.v=68

--- In amibroker@xxxxxxxxxxxxxxx, "tipequity" <l3456@xxx> wrote:
>
> Below is a link to an interesting article regarding the use GPUs in 
> monte carlo pricing models.
> 
> --- In amibroker@xxxxxxxxxxxxxxx, "vlanschot" <vlanschot@> wrote:
> >
> > Thanks everybody for contributing to this discussion. A few days 
> ago 
> > I was wondering why we hadn't seen any mails from Tomasz on this 
> > topic. Well, we certainly have now. Thanks Tomasz for taking your 
> > precious time to clarify this matter. 
> > 
> > The main message, as always, is that people use AB in countless 
> > fascinating ways. Claerly only Tomasz is able to keep all these 
> > usages in mind and, to his credit, balance their requirements, 
and 
> > develop AB into the versatile product it is. In fact, I would 
argue 
> > that the interesting exploration by dloyer123 beyond the usual 
> > boundaries of AB is a case in point. Although they usually are 
only 
> > beneficial to a sample of the AB population, (the ability to 
> develop) 
> > plug-ins for AB hugely enrich its workings.
> > 
> > Shall/Can we move on?
> > 
> > PS
> > 
> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@> 
> > wrote:
> > >
> > > Hello,
> > > 
> > > No, it can't for one simple reason I pointed out earlier: not 
> > having enough RAM
> > > to keep all intermediate values in RAM during entire 
optimization 
> > (all steps).
> > > You don't seem to accept that fact
> > > that many people are running optimizations on more data than 
you,
> > > and that implementing your suggestion would be equivalent to 
> saying 
> > to them:
> > > "alright now you need to upgrade to 64 bit Windows and buy 20 
GB 
> or 
> > RAM"
> > > 
> > > The fact that your tests fit into RAM that you have on your 
> > computer,
> > > does not necesarily means that 
> > > a) everyone has as much RAM as you
> > > and/or
> > > b) everyone has same problem to solve as you.
> > > 
> > > Unlike other softwares, AmiBroker tries to be "resonable" when 
it 
> > comes to memory
> > > usage, and thats why it allows to run tests that simply are out 
> of 
> > capabilities
> > > of other softwares that don't care about memory usage.
> > > 
> > > Before you suggest implementing "special cases" for 
> > > when all data and all intermediate results could fit in RAM, 
yes 
> I 
> > could do that, 
> > > but at the expense of higher risk of bugs, higher cost of 
> > maintenance of 
> > > code that has multiple "versions" inside. And I won't go that 
> road.
> > > 
> > > When designing general-purpose software you need to find 
balance.
> > > 
> > > If you fail to find balance, you end up with Operating System 
> that 
> > needs 2GB (see Vista)
> > > and uses 100-200MB of RAM for windows transparency effect (see 
> > DWM.EXE memory usage).
> > > I hear all the time that "memory is cheap". Maybe it is cheap 
but 
> > it does not mean
> > > that cost is zero and it does not mean that actually accessing 
> > gigabytes of RAM costs zero time.
> > > If someone thinks that it does not matter I suggest installing 
> > Windows 95 on current machine
> > > and see how fast it boots from cold restart. 
> > > Nowdays I see lots of programmers who don't care if user needs 
to 
> > download 30MB runtime
> > > (see all .NET apps) and all they have to say is "buy bigger 
hard 
> > disk". I don't agree with that kind of thinking. 
> > > 
> > > Best regards,
> > > Tomasz Janeczko
> > > amibroker.com
> > > ----- Original Message ----- 
> > > From: "dloyer123" <dloyer123@>
> > > To: <amibroker@xxxxxxxxxxxxxxx>
> > > Sent: Wednesday, August 13, 2008 6:18 PM
> > > Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> > > 
> > > 
> > > > TJ:  I am merely suggesting that some work could be shifted 
> from 
> > once 
> > > > per optimization step to once per optimization run and that 
> many 
> > of 
> > > > your customers would benefit.  
> > > > 
> > > > If you don't like the suggestion, fine.  But there is no 
reason 
> > to be 
> > > > a jerk about it.
> > > > 
> > > > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@> 
> > > > wrote:
> > > >>
> > > >> Hello,
> > > >> 
> > > >> Of course AFL execution is just a FRACTION of time needed to
> > > >> perform full-featured portfolio backtest. There are many 
other 
> > steps
> > > >> involved, namely:
> > > >> 
> > > >> PHASE 1:
> > > >> FOR EVERY SYMBOL UNDER TEST
> > > >> 
> > > >> A) preparing data
> > > >> This involves the following sub steps:
> > > >> a.1) reading the data from the disk (if not cached already) 
or 
> > > > requesting data from external plugin (many external sources 
are 
> > > >> pretty slow)
> > > >> a.2) time compressing in selected periodicty (say your 
> database 
> > is 
> > > > 1-minute
> > > >> and your optimization periodicity is 5 minute)
> > > >> a.3) filtering for weekends/regular trading hours etc (if 
> > enabled)
> > > >> a.4) performing padding to reference symbol (if enabled)
> > > >> Note that steps a.2 and a.3 are done simultaneously (in one 
> > loop) 
> > > > for speed
> > > >> 
> > > >> B) AFL execution (that is what is reported as "AFL execution 
> > time").
> > > >> 
> > > >> b.1) setting up AFL engine - allocating and filling built in 
> > arrays 
> > > > (buyprice/sellprice/shortprice/coverprice/margindeposit/
> > > >> positionsize/open/high/low/close/openint/volume, etc....)
> > > >> b.2) actual execution
> > > >> b.3) cleanup (freeing allocated temporary memory used for 
> > execution)
> > > >> 
> > > >> C) collection / sorting / ranking of trading signals
> > > >> After buy/sell/short/cover arrays are known AmiBroker 
collects 
> > > > signals
> > > >> and ranks them according to position score. At this step also
> > > >> features like HoldMinBars, trading delays, various kinds of
> > > >> built-in stops, etc, etc.
> > > >> 
> > > >> 
> > > >> PHASE 2:
> > > >> ONCE FOR BACKTEST (or OPTIMIZATION STEP):
> > > >> D.1) Actual Porftfolio backtest (simplified a bit):
> > > >> 
> > > >> For EVERY BAR under test
> > > >> {
> > > >>   For EVERY SIGNAL in the sorted signal list
> > > >>   {
> > > >>       PROCESS SIGNAL (signal processing is complex as it 
> handles 
> > > > features like multiple currencies and scaling, early exit 
> penalty 
> > > >> fees)
> > > >>       UpdateEquity()
> > > >>  }
> > > >>  Calculate TRADE STATISTICS
> > > >>  Calculate EQUITY STATISTICS
> > > >>  Evaluate/Handle STOPS
> > > >> }
> > > >> 
> > > >> Note that if any Custom Backtester procedure is defined 
there 
> is 
> > > > one more AFL execution
> > > >> that allows to control D.1.
> > > >> 
> > > >> D.2) Report output (generating HTML backtest report), 
> generating 
> > > > MAE/MFE charts
> > > >> 
> > > >> All those non-afl-execution steps, account for roughly 1-2 
> > > > nanoseconds overhead per bar per symbol.
> > > >> 
> > > >> I would not say that 1-2 nanosecond overhead is "not-
> efficient" 
> > > > or "performance bottleneck"
> > > >> considering the amount of work fully featured portfolio 
> > backtester 
> > > > is doing.
> > > >> 
> > > >> If you do the math.
> > > >> 10 step optimization * 850 symbols * 20000 bars per symbol 
> gives
> > > >> 
> > > >> 
> > > > 
> > 
> 
======================================================================
> > > > ============
> > > >> You got 170 MILLION (1.7e8) bars to process. Multiply that 
by 
> 1 
> > > > nanosecond (1e-9) and you will get 17 seconds.
> > > >> 
> > > > 
> > 
> 
======================================================================
> > > > ============
> > > >> 
> > > >> Quite frankly it seems that you are comparing apples to 
> oranges. 
> > If 
> > > > your own codes simply do not do steps
> > > >> A) C) and D)  it surely will run faster but any code that 
does 
> > less 
> > > > work will run faster.
> > > >> I am not sure what is the purpose of this thread. I am 
getting 
> > > > pretty tired trying to convince you
> > > >> that I did my job well.
> > > >> 
> > > >> Best regards,
> > > >> Tomasz Janeczko
> > > >> amibroker.com
> > > >> ----- Original Message ----- 
> > > >> From: "dloyer123" <dloyer123@>
> > > >> To: <amibroker@xxxxxxxxxxxxxxx>
> > > >> Sent: Wednesday, August 13, 2008 3:03 AM
> > > >> Subject: [amibroker] Re: Freakishly fast backtest using 64 
> cores
> > > >> 
> > > >> 
> > > >> > The idea of using array processing for this problem rather 
> > than 
> > > > the
> > > >> > more traditional for/next loop was a really good idea.  
That 
> > part 
> > > > of
> > > >> > the system is very fast at what it does and provides a 
great 
> > > > amount
> > > >> > of freedom and flexibility.
> > > >> >
> > > >> > However, consider the trivial system:
> > > >> >
> > > >> > Buy = 0;
> > > >> > Sell = 0;
> > > >> > Short = 0;
> > > >> > Cover = 0;
> > > >> > Optimize("val",0,1,10,1);
> > > >> >
> > > >> > Clearly the AFL engine is being invoked, but this could be 
> > > > considered
> > > >> > the fastest possible AFL code, or very close to it.  It's 
> > > > execution
> > > >> > time is not zero, but pretty darn close.  The "check alf" 
> > function
> > > >> > measures 0.3ms for 64,000 bars, for the "Optimize" 
statement.
> > > >> >
> > > >> > On my 3Gz Core 2, this system takes 18 seconds to backtest 
> > over a
> > > >> > portfolio of 850 symbols, 1 year, 5 minute bars.  This 
time 
> > does 
> > > > not
> > > >> > vary much with the size of the test window.  Since "Quick 
> AFL" 
> > is
> > > >> > slected, there should be about 20k bars per symbol.
> > > >> >
> > > >> > Running optimize, takes roughly the same time as the 
> backtest 
> > for
> > > >> > each and every pass, 18 seconds each and every time.  That 
> 18 
> > > > seconds
> > > >> > can not be explained by the AFL code execution alone.  
There 
> > is 
> > > > other
> > > >> > stuff being done that takes much, much, much longer.
> > > >> >
> > > >> > So, even if AFL execution runs in zero time, this is the 
> limit 
> > of 
> > > > how
> > > >> > fast AmiBroker can optimize.
> > > >> >
> > > >> > So, yes, the AFL execution is highly optimized and very 
> fast, 
> > but
> > > >> > there is a lot of overhead that is outside of the AFL 
> > execution.  
> > > > I
> > > >> > could guess what it is doing, but it really does not 
matter.
> > > >> >
> > > >> > I am only pointing out that it a very juicy opportunity for
> > > >> > meaningful performance gains.
> > > >> >
> > > >> > Yes, there are memory management issues, and yes, some 
data 
> > sets 
> > > > may
> > > >> > be too large to take advantage of it, but a large fraction 
> of 
> > your
> > > >> > customer base would.  It could even be made transparent to 
> the 
> > > > user
> > > >> > with no extra checkboxes.  No exotic hardware required.  
It 
> > would
> > > >> > even work on a laptop.
> > > >> >
> > > >> > When I run my system on the emulator, it just runs on the 
> > normal 
> > > > cpu
> > > >> > core, using normal system memory.  If anything, there has 
to 
> > be a 
> > > > lot
> > > >> > of overhead in pretending to run as so many threads, and 
the 
> > data 
> > > > set
> > > >> > is far larger than the L2 cache.  But it is still much 
> faster 
> > than
> > > >> > the built in backtest.  Yes, part of that is hand 
optimized 
> > code, 
> > > > but
> > > >> > that does not explain the performance differential or 20x 
to 
> > 50x 
> > > > of
> > > >> > Ami vs emulator.  Running on the GPU is more like 4000x.  
> Yes 
> > the
> > > >> > GPU, has more memory bandwidth to work with, but not that 
> much 
> > > > more.
> > > >> >
> > > >> > I would say that the AFL execution code is highly 
optimized 
> > and 
> > > > fully
> > > >> > exploits the hardware it has to work with, but that there 
are
> > > >> > performance bottlenecks elsewhere in the critical path. I 
> can 
> > not
> > > >> > tell you what they are, but I would guess that it is 
> > rebuilding 
> > > > price
> > > >> > arrays and maybe other data structures on every pass.
> > > >> >
> > > >> > Anyway, I dont mean to tell you your business and you are 
> much 
> > > > closer
> > > >> > to this problem than I am.  Maybe there is some edge case 
> that 
> > I 
> > > > have
> > > >> > not considered that forces a performance hit.  It is still 
> way 
> > > > faster
> > > >> > than EasyLanguage.
> > > >> >
> > > >> > I am a big fan of your work and enjoy using your product.  
> The
> > > >> > passion that you put into it shows.
> > > >> >
> > > >> >
> > > >> >
> > > >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" 
<groups@>
> > > >> > wrote:
> > > >> >>
> > > >> >> Hello,
> > > >> >>
> > > >> >> What is true for GPU it is not necesarily true for CPU. 
GPU 
> > has
> > > >> > dedicated wide RAM
> > > >> >> bus and faster RAM as opposed to system memory.
> > > >> >>
> > > >> >> AmiBroker does a lot to utilise memory to maximum extent 
> where
> > > >> > possible/feasible.
> > > >> >>
> > > >> >> Actually AFL speed is limited by system memory if you run 
> out 
> > of 
> > > > on-
> > > >> > chip cache.
> > > >> >> http://www.amibroker.com/kb/2008/08/12/afl-execution-
speed/
> > > >> >>
> > > >> >> So going for more memory usage not always means faster 
> > execution.
> > > >> >>
> > > >> >> Sure you can pre-compute everything, and use pre-computed 
> > values
> > > >> > but
> > > >> >> you need to understand that people are doing VERY 
different 
> > > > things
> > > >> > with AmiBroker
> > > >> >> and their problems are not the same as problems you are 
> > trying to
> > > >> > solve.
> > > >> >> For example some customers are backtesting entire US 
stock 
> > > > universe
> > > >> > (8000+ symbols)
> > > >> >> over 10 or 20 years. That's about 1.3GB for DATA alone. 
Now 
> > if 
> > > > you
> > > >> > are running
> > > >> >> porfolio backtest you need to keep trading signals and 
that 
> > can 
> > > > be
> > > >> > as much as 1GB in
> > > >> >> such case. Quickly you are reaching 3GB RAM limit of 32 
OS. 
> > There
> > > >> > is no place
> > > >> >> to store "pre-computed" values.
> > > >> >> AmiBroker by nature needs to provide best blend of speed, 
> > > > moderate
> > > >> > memory / CPU requirements.
> > > >> >> User-specific single-task solutions may go into 
> > specialisation 
> > > > and
> > > >> > tricks that are
> > > >> >> not feasible for commercial general-purpose product that 
is
> > > >> > intended to keep
> > > >> >> large user base happy.
> > > >> >>
> > > >> >> Best regards,
> > > >> >> Tomasz Janeczko
> > > >> >> amibroker.com
> > > >> >> ----- Original Message ----- 
> > > >> >> From: "dloyer123" <dloyer123@>
> > > >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> > > >> >> Sent: Tuesday, August 12, 2008 4:09 PM
> > > >> >> Subject: [amibroker] Re: Freakishly fast backtest using 
64 
> > cores
> > > >> >>
> > > >> >>
> > > >> >> > The programing guide lists the 8600M and 8700M as 
having 
> 32
> > > >> > computing
> > > >> >> > cores.  Not sure what they are clocked at.  Power is an 
> > issue.
> > > >> > The
> > > >> >> > desktop versions need dedicated power connectors.  The 
> big 
> > > > cards
> > > >> > need
> > > >> >> > two.
> > > >> >> >
> > > >> >> > Actually, when I am doing development on my laptop, I 
> just 
> > use
> > > >> > the
> > > >> >> > emulator.  It is about 100x slower than my desktop 
> system, 
> > but
> > > >> > still
> > > >> >> > about 20x to 50x faster than Ami alone.  The speed 
> > difference 
> > > > in
> > > >> >> > emulation mode is mostly due to the precomputed and 
> cached 
> > > > price
> > > >> >> > arrays.
> > > >> >> >
> > > >> >> > Tomasz:  I suspect that there is an opportunity to 
trade 
> > memory
> > > >> > for
> > > >> >> > speed, even with 1 core.  Memory is cheap and would be 
a 
> > > > simpler
> > > >> > way
> > > >> >> > to get a performance boost than porting to multi core, 
> GPU 
> > or
> > > >> > CPU.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" 
> > <groups@>
> > > >> >> > wrote:
> > > >> >> >>
> > > >> >> >> Dell has 3 off the shelf
> > > >> >> >> > laptops in their entertainment/performance range 
that 
> use
> > > >> > GeForce
> > > >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 
> required 
> > for
> > > >> > CUDA?)
> > > >> >> >>
> > > >> >> >> Mobile ones are very poor cousins. Belive me. I own 
> brand 
> > new
> > > >> >> > notebook (ASUS) with GeForce8600M
> > > >> >> >> and it is SLOW in 3D. I mean SLOW. Did I mention that 
it 
> is
> > > >> > SLOW?
> > > >> >> >>
> > > >> >> >> In 3D Mark it gets the same results as my 3 year old 
> > desktop
> > > >> > 6600GT.
> > > >> >> >>
> > > >> >> >> Best regards,
> > > >> >> >> Tomasz Janeczko
> > > >> >> >> amibroker.com
> > > >> >> >> ----- Original Message ----- 
> > > >> >> >> From: "brian_z111" <brian_z111@>
> > > >> >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> > > >> >> >> Sent: Tuesday, August 12, 2008 12:40 AM
> > > >> >> >> Subject: [amibroker] Re: Freakishly fast backtest 
using 
> 64 
> > > > cores
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> > DL
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> > I am following at the top level and understand what 
> you 
> > are
> > > >> > doing
> > > >> >> > OK
> > > >> >> >> > (you make me wish I had learnt programming/IT).
> > > >> >> >> >
> > > >> >> >> > I like your CPU.
> > > >> >> >> >
> > > >> >> >> > Allowing niche trading is what AB is all about?
> > > >> >> >> >
> > > >> >> >> > I'll put my money on MS/"general purpose computing 
on 
> > GPU" -
> > > > I
> > > >> >> > don't
> > > >> >> >> > think the masses are in love with MS but for 80% of 
> > people 
> > > > who
> > > >> >> > can do
> > > >> >> >> > 80% of what they want with MS the price to move 
> > elsewhere is
> > > >> > too
> > > >> >> >> > high - they are just in love with max output for min 
> > input.
> > > >> >> >> >
> > > >> >> >> > If you go to the trouble to write a plug-in do you 
> think 
> > it
> > > >> > will
> > > >> >> > be
> > > >> >> >> > around long/require much ongoing support from you?
> > > >> >> >> >
> > > >> >> >> > I can see the benefits of the speed - for a group of 
> > traders
> > > >> > it
> > > >> >> > is a
> > > >> >> >> > definite edge they would have for a year or two (I 
> don't 
> > > > think
> > > >> >> > any
> > > >> >> >> > other trading software will be seeing this for a 
> while? -
> > > >> >> > especially
> > > >> >> >> > in the AT area where more crunching could be done 
fast 
> > > > enough
> > > >> > to
> > > >> >> > keep
> > > >> >> >> > up with live data.
> > > >> >> >> >
> > > >> >> >> > I don't blame Tomasz for not sitting his backside on 
> the
> > > >> > cutting
> > > >> >> >> > edge - too dangerous for developers with long term 
> > > > clientele.
> > > >> >> >> >
> > > >> >> >> > Not having a go at Tomasz - to clarify - Tomeasz 
said 
> > > > GEForce
> > > >> >> > 8800
> > > >> >> >> > can't be put in a notebook?
> > > >> >> >> >
> > > >> >> >> > To my understanding there seems to be a reasonable 
> > number of
> > > >> >> > laptops
> > > >> >> >> > around that could use your method e.g. Dell has 3 
off 
> the
> > > >> > shelf
> > > >> >> >> > laptops in their entertainment/performance range 
that 
> use
> > > >> > GeForce
> > > >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 
> required 
> > for
> > > >> > CUDA?)
> > > >> >> >> >
> > > >> >> >> > I looked at the GeF links in Paul's post but they 
> didn't 
> > > > have
> > > >> >> > much
> > > >> >> >> > specific info there that I could see - I assume the 
> above
> > > >> > cards
> > > >> >> > wiil
> > > >> >> >> > run your system.
> > > >> >> >> >
> > > >> >> >> > I am not a buyer for now but good luck with it and 
> what 
> > you
> > > >> > have
> > > >> >> > done
> > > >> >> >> > already is a good contribution to AB - once someone 
on 
> > the
> > > >> > block
> > > >> >> > has
> > > >> >> >> > a new super-dooper gadget pretty soon the neighbours 
> > want 
> > > > one
> > > >> > too
> > > >> >> > and
> > > >> >> >> > demand grows.
> > > >> >> >> >
> > > >> >> >> > brian_z
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" 
> > <dloyer123@>
> > > >> > wrote:
> > > >> >> >> >>
> > > >> >> >> >> This uses the mid range video card that happened to 
> > come 
> > > > with
> > > >> > my
> > > >> >> >> >> system, a 9800GT.  The newer 260 and 280 cards are 
3 
> to 
> > 4
> > > >> > times
> > > >> >> >> >> faster.  The 260 can be found at best buy for 
$300.  
> > Some
> > > >> >> > laptops
> > > >> >> >> >> have compatible cards as well.
> > > >> >> >> >>
> > > >> >> >> >> The video card has its own memory, mine has 512MB, 
> some 
> > > > have
> > > >> > as
> > > >> >> >> > much
> > > >> >> >> >> as 1GB.  This memory is very fast, once it is 
loaded 
> > from 
> > > > the
> > > >> >> > main
> > > >> >> >> >> system.  Nvidia has a professional line of products 
> > that 
> > > > have
> > > >> >> > much
> > > >> >> >> >> more memory.
> > > >> >> >> >>
> > > >> >> >> >> Get get the best performance, my AFL code makes one 
> > pass 
> > > > over
> > > >> >> > the
> > > >> >> >> >> data, calling a Dll.  The Dll takes all of the data 
> > needed 
> > > > by
> > > >> >> > the
> > > >> >> >> >> calculation and loads a copy to the video card.  
This 
> > > > upload
> > > >> > is
> > > >> >> >> > slow,
> > > >> >> >> >> the entire upload takes about 45 seconds for all 
1000
> > > >> > symbols.
> > > >> >> >> >>
> > > >> >> >> >> Once all of the data is uploaded, the Dll loads 
> > a "kernel"
> > > >> > into
> > > >> >> > the
> > > >> >> >> >> graphics cores that perform the actual computation 
and
> > > >> > generates
> > > >> >> >> > the
> > > >> >> >> >> trade list.  This part is very fast and performs 
all 
> of 
> > the
> > > >> > same
> > > >> >> >> >> functions that my AFL version does.  The resulting 
> > trade 
> > > > list
> > > >> > is
> > > >> >> >> > the
> > > >> >> >> >> same.
> > > >> >> >> >>
> > > >> >> >> >> Because the data loaded into video memory, it can 
be 
> > > > resused
> > > >> > for
> > > >> >> >> > many
> > > >> >> >> >> passes over the data with different optimization 
> > values.  
> > > > So,
> > > >> >> >> >> hundreds of combinations of optimization values can 
> be 
> > > > tried
> > > >> > per
> > > >> >> >> >> second.
> > > >> >> >> >>
> > > >> >> >> >> For non optimization runs, the Dll just loads one 
> > symbol 
> > > > into
> > > >> >> > video
> > > >> >> >> >> memory and processes it.  Counting the overhead of 
> > moving
> > > >> > data
> > > >> >> > to
> > > >> >> >> > the
> > > >> >> >> >> video card and extracting the trade list for a 
single 
> > > > symbol,
> > > >> >> > the
> > > >> >> >> >> result is similar to AFL code alone.  This lets me 
> test 
> > the
> > > >> > code
> > > >> >> >> > and
> > > >> >> >> >> make sure it is correct.
> > > >> >> >> >>
> > > >> >> >> >> This approach works best when the data only needs 
to 
> be
> > > >> > loaded
> > > >> >> >> > once,
> > > >> >> >> >> then "resused" many times.  It also works best when 
> > there 
> > > > is
> > > >> > a
> > > >> >> > lot
> > > >> >> >> > of
> > > >> >> >> >> data to work with.
> > > >> >> >> >>
> > > >> >> >> >> What is more interesting to me and what would be 
more 
> > > > useful
> > > >> > for
> > > >> >> >> >> others would be a general drive that requires no 
Dll 
> > > > changes
> > > >> > to
> > > >> >> >> >> modify the system.  The performance would not be as 
> > good as
> > > >> > hand
> > > >> >> >> >> optimized code, but would still be much better than 
> AFL 
> > > > code
> > > >> >> >> > alone.
> > > >> >> >> >> It would take trading system design to a whole new 
> > level.  
> > > > It
> > > >> >> > would
> > > >> >> >> >> provide enough performance to make working with 
Intra 
> > day
> > > >> > data
> > > >> >> > as
> > > >> >> >> >> easy as daily data is today.
> > > >> >> >> >>
> > > >> >> >> >> Writing such a driver would be hard, but I have 
> already 
> > > > done
> > > >> >> > some
> > > >> >> >> >> prototypes and design work.  I am tempted to do it 
> for 
> > my 
> > > > own
> > > >> >> > use.
> > > >> >> >> >> If I made it available to others supporting it 
would 
> be 
> > a
> > > >> > PITA.
> > > >> >> >> >>
> > > >> >> >> >>
> > > >> >> >> >>
> > > >> >> >> >>
> > > >> >> >> >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" 
> <paul.tsho@>
> > > >> > wrote:
> > > >> >> >> >> >
> > > >> >> >> >> > I'm very interested
> > > >> >> >> >> > could you elaborate a bit more
> > > >> >> >> >> > What model of Nvidia chipset are you using, and 
> with 
> > how
> > > >> > much
> > > >> >> >> >> memory?
> > > >> >> >> >> > Not sure exactly what you mean when you say
> > > >> >> >> >> > It uses AmiBroker to load the symbol data and 
> perform
> > > >> >> >> > calculations
> > > >> >> >> >> > that do not depend on the optimization 
parameters. 
> > Once
> > > >> > loaded
> > > >> >> >> > into
> > > >> >> >> >> > video memory, repeated passes can be made with 
> > different
> > > >> >> >> >> parameters,
> > > >> >> >> >> > avoiding any overhead.
> > > >> >> >> >> > Can you give me some examples. I presume when 
your 
> > dll is
> > > >> >> > called.
> > > >> >> >> >> AB passes
> > > >> >> >> >> > one or more arrays of data belonging to 1 symbol, 
> is 
> > that
> > > >> > true?
> > > >> >> >> >> > Not sure exactly what the rest mean either. How 
many
> > > >> > functions
> > > >> >> >> > are
> > > >> >> >> >> you
> > > >> >> >> >> > running in your dll, and what does each of the do?
> > > >> >> >> >> > Great of you to share your insight.
> > > >> >> >> >> > Cheers
> > > >> >> >> >> > Paul.
> > > >> >> >> >> >
> > > >> >> >> >> >
> > > >> >> >> >> >
> > > >> >> >> >> >   _____
> > > >> >> >> >> >
> > > >> >> >> >> > From: amibroker@xxxxxxxxxxxxxxx
> > > >> >> >> > [mailto:amibroker@xxxxxxxxxxxxxxx]
> > > >> >> >> >> On Behalf
> > > >> >> >> >> > Of dloyer123
> > > >> >> >> >> > Sent: Tuesday, 5 August 2008 9:19 AM
> > > >> >> >> >> > To: amibroker@xxxxxxxxxxxxxxx
> > > >> >> >> >> > Subject: [amibroker] Freakishly fast backtest 
using 
> > 64 
> > > > cores
> > > >> >> >> >> >
> > > >> >> >> >> >
> > > >> >> >> >> >
> > > >> >> >> >> > Greetings,
> > > >> >> >> >> >
> > > >> >> >> >> > I ported part of my AFL backtest code to a 
plugin, 
> > that
> > > >> > takes
> > > >> >> >> >> > advantage of the graphics math cores on the video 
> > card 
> > > > that
> > > >> >> > are
> > > >> >> >> >> > normally used for 3d graphics.
> > > >> >> >> >> >
> > > >> >> >> >> > I was able to get a several thousand fold 
> performance
> > > >> >> > improvement
> > > >> >> >> >> > over AFL code alone.
> > > >> >> >> >> >
> > > >> >> >> >> > My goal was to reduce the 25 seconds AFL code 
alone 
> > uses
> > > >> > for a
> > > >> >> >> >> single
> > > >> >> >> >> > portfolio level back test to less than 1 second, 
> > allowing
> > > >> >> > multi
> > > >> >> >> > day
> > > >> >> >> >> > optimization and walkforward runs to complete in 
a 
> > more
> > > >> >> >> > reasonable
> > > >> >> >> >> > time, and also just to see how fast I could get 
it 
> to 
> > > > run.
> > > >> >> >> >> >
> > > >> >> >> >> > The backtest runs over 1 year of 5 minute bars 
for 
> > about
> > > >> > 1000
> > > >> >> >> >> > symbols. 1 year of data normally takes 25 seconds 
> for
> > > >> >> > AmiBroker
> > > >> >> >> >> > alone, or 18 seconds for 6 months of data. A 
typical
> > > >> >> > optimization
> > > >> >> >> >> > run takes hundreds of these passes per walk 
forward 
> > step,
> > > >> >> > taking
> > > >> >> >> >> > hours.
> > > >> >> >> >> >
> > > >> >> >> >> > Using the Nvidia CUDA API, running on my mid 
range 
> > video
> > > >> > card.
> > > >> >> > It
> > > >> >> >> >> > was much faster. Much, much, much faster. How 
fast?
> > > >> >> >> >> >
> > > >> >> >> >> > It reduced the run time from 25s to... 4.4ms. 
That 
> is 
> > > > more
> > > >> >> > than
> > > >> >> >> >> > 200/s!
> > > >> >> >> >> >
> > > >> >> >> >> > I didnt believe the timing when I saw it at 
first. 
> > So, I
> > > >> > put
> > > >> >> >> > 1,000
> > > >> >> >> >> > runs in a loop and sure enough, it ran 1,000 
> > iterations 
> > > > in
> > > >> >> > about
> > > >> >> >> > 4
> > > >> >> >> >> > 1/2 seconds. This far exceeded my gaol or 
> > expectations.
> > > >> >> >> >> >
> > > >> >> >> >> > The resulting trade list matches that obtained by 
> the 
> > AFL
> > > >> >> > version
> > > >> >> >> >> of
> > > >> >> >> >> > this code.
> > > >> >> >> >> >
> > > >> >> >> >> > I estimate that it is processing 32GB of bar 
> data/sec.
> > > >> >> >> >> >
> > > >> >> >> >> > Getting this to work at peak performance was 
> tricky. 
> > Most
> > > >> > of
> > > >> >> > what
> > > >> >> >> > I
> > > >> >> >> >> > have learned about code optimization does not 
apply.
> > > >> >> >> >> >
> > > >> >> >> >> > It uses AmiBroker to load the symbol data and 
> perform
> > > >> >> >> > calculations
> > > >> >> >> >> > that do not depend on the optimization 
parameters. 
> > Once
> > > >> > loaded
> > > >> >> >> > into
> > > >> >> >> >> > video memory, repeated passes can be made with 
> > different
> > > >> >> >> >> parameters,
> > > >> >> >> >> > avoiding any overhead.
> > > >> >> >> >> >
> > > >> >> >> >> > For non backtest/optimization runs, the code just 
> > > > evaluates
> > > >> >> > one
> > > >> >> >> >> > symbol and passes the data back to AmiBroker
> > > >> >> > buy/sell/short/cover
> > > >> >> >> >> > arrays, making it easy to test, validate and 
> > visualize 
> > > > the
> > > >> >> >> > trades.
> > > >> >> >> >> > There is very little performance gain in this 
case.
> > > >> >> >> >> >
> > > >> >> >> >> > There are problems, however. To run optimizations 
> at 
> > peak
> > > >> >> > speed,
> > > >> >> >> > I
> > > >> >> >> >> > can not use AmiBroker to calculate the 
optimization 
> > goal
> > > >> >> >> > function.
> > > >> >> >> >> > So, I am in the process of writing code to match 
> > signals
> > > >> > and
> > > >> >> >> >> > calculate the portfolio fitness function. Once I 
do 
> > > > this, I
> > > >> >> > will
> > > >> >> >> > be
> > > >> >> >> >> > able to perform full optimizations and walk 
> forwards 
> > at 3
> > > >> >> > orders
> > > >> >> >> > of
> > > >> >> >> >> > magnitude faster than is possible with AmiBroker 
> > alone.
> > > >> >> >> >> >
> > > >> >> >> >> > Also, this is not general purpose code. Changing 
> the 
> > > > system
> > > >> >> > code
> > > >> >> >> >> > means changing a dll written in C. However, there 
> is 
> > no
> > > >> > reason
> > > >> >> >> > that
> > > >> >> >> >> > this could not be made more general.
> > > >> >> >> >> >
> > > >> >> >> >> > I have made some prototypes of "Cuda" versions of 
> > basic 
> > > > AFL
> > > >> >> >> >> > functions. The idea is to queue the function 
calls 
> > into a
> > > >> >> >> >> definition
> > > >> >> >> >> > executed by a micro kernel running on the 
graphics 
> > cores.
> > > >> > The
> > > >> >> >> >> result
> > > >> >> >> >> > would be the ability to use the full power of the 
> > > > graphics
> > > >> >> > cores
> > > >> >> >> > by
> > > >> >> >> >> > modifying AFL code to use Cuda aware versions 
with 
> no
> > > >> > changes
> > > >> >> > to
> > > >> >> >> > C
> > > >> >> >> >> > code. It would be an interesting, but big project.
> > > >> >> >> >> >
> > > >> >> >> >>
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> > ------------------------------------
> > > >> >> >> >
> > > >> >> >> > Please note that this group is for discussion 
between 
> > users
> > > >> > only.
> > > >> >> >> >
> > > >> >> >> > To get support from AmiBroker please send an e-mail 
> > directly
> > > >> > to
> > > >> >> >> > SUPPORT {at} amibroker.com
> > > >> >> >> >
> > > >> >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always 
> check
> > > >> > DEVLOG:
> > > >> >> >> > http://www.amibroker.com/devlog/
> > > >> >> >> >
> > > >> >> >> > For other support material please check also:
> > > >> >> >> > http://www.amibroker.com/support.html
> > > >> >> >> > Yahoo! Groups Links
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >>
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > ------------------------------------
> > > >> >> >
> > > >> >> > Please note that this group is for discussion between 
> users 
> > > > only.
> > > >> >> >
> > > >> >> > To get support from AmiBroker please send an e-mail 
> > directly to
> > > >> >> > SUPPORT {at} amibroker.com
> > > >> >> >
> > > >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always 
check 
> > > > DEVLOG:
> > > >> >> > http://www.amibroker.com/devlog/
> > > >> >> >
> > > >> >> > For other support material please check also:
> > > >> >> > http://www.amibroker.com/support.html
> > > >> >> > Yahoo! Groups Links
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > ------------------------------------
> > > >> >
> > > >> > Please note that this group is for discussion between 
users 
> > only.
> > > >> >
> > > >> > To get support from AmiBroker please send an e-mail 
directly 
> to
> > > >> > SUPPORT {at} amibroker.com
> > > >> >
> > > >> > For NEW RELEASE ANNOUNCEMENTS and other news always check 
> > DEVLOG:
> > > >> > http://www.amibroker.com/devlog/
> > > >> >
> > > >> > For other support material please check also:
> > > >> > http://www.amibroker.com/support.html
> > > >> > Yahoo! Groups Links
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > > 
> > > > 
> > > > 
> > > > ------------------------------------
> > > > 
> > > > Please note that this group is for discussion between users 
> only.
> > > > 
> > > > To get support from AmiBroker please send an e-mail directly 
to 
> > > > SUPPORT {at} amibroker.com
> > > > 
> > > > For NEW RELEASE ANNOUNCEMENTS and other news always check 
> DEVLOG:
> > > > http://www.amibroker.com/devlog/
> > > > 
> > > > For other support material please check also:
> > > > http://www.amibroker.com/support.html
> > > > Yahoo! Groups Links
> > > > 
> > > > 
> > > >
> > >
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/