[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

Thanks everybody for contributing to this discussion. A few days ago 
I was wondering why we hadn't seen any mails from Tomasz on this 
topic. Well, we certainly have now. Thanks Tomasz for taking your 
precious time to clarify this matter. 

The main message, as always, is that people use AB in countless 
fascinating ways. Claerly only Tomasz is able to keep all these 
usages in mind and, to his credit, balance their requirements, and 
develop AB into the versatile product it is. In fact, I would argue 
that the interesting exploration by dloyer123 beyond the usual 
boundaries of AB is a case in point. Although they usually are only 
beneficial to a sample of the AB population, (the ability to develop) 
plug-ins for AB hugely enrich its workings.

Shall/Can we move on?

PS

--- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@xxx> 
wrote:
>
> Hello,
> 
> No, it can't for one simple reason I pointed out earlier: not 
having enough RAM
> to keep all intermediate values in RAM during entire optimization 
(all steps).
> You don't seem to accept that fact
> that many people are running optimizations on more data than you,
> and that implementing your suggestion would be equivalent to saying 
to them:
> "alright now you need to upgrade to 64 bit Windows and buy 20 GB or 
RAM"
> 
> The fact that your tests fit into RAM that you have on your 
computer,
> does not necesarily means that 
> a) everyone has as much RAM as you
> and/or
> b) everyone has same problem to solve as you.
> 
> Unlike other softwares, AmiBroker tries to be "resonable" when it 
comes to memory
> usage, and thats why it allows to run tests that simply are out of 
capabilities
> of other softwares that don't care about memory usage.
> 
> Before you suggest implementing "special cases" for 
> when all data and all intermediate results could fit in RAM, yes I 
could do that, 
> but at the expense of higher risk of bugs, higher cost of 
maintenance of 
> code that has multiple "versions" inside. And I won't go that road.
> 
> When designing general-purpose software you need to find balance.
> 
> If you fail to find balance, you end up with Operating System that 
needs 2GB (see Vista)
> and uses 100-200MB of RAM for windows transparency effect (see 
DWM.EXE memory usage).
> I hear all the time that "memory is cheap". Maybe it is cheap but 
it does not mean
> that cost is zero and it does not mean that actually accessing 
gigabytes of RAM costs zero time.
> If someone thinks that it does not matter I suggest installing 
Windows 95 on current machine
> and see how fast it boots from cold restart. 
> Nowdays I see lots of programmers who don't care if user needs to 
download 30MB runtime
> (see all .NET apps) and all they have to say is "buy bigger hard 
disk". I don't agree with that kind of thinking. 
> 
> Best regards,
> Tomasz Janeczko
> amibroker.com
> ----- Original Message ----- 
> From: "dloyer123" <dloyer123@xxx>
> To: <amibroker@xxxxxxxxxxxxxxx>
> Sent: Wednesday, August 13, 2008 6:18 PM
> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> 
> 
> > TJ:  I am merely suggesting that some work could be shifted from 
once 
> > per optimization step to once per optimization run and that many 
of 
> > your customers would benefit.  
> > 
> > If you don't like the suggestion, fine.  But there is no reason 
to be 
> > a jerk about it.
> > 
> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@> 
> > wrote:
> >>
> >> Hello,
> >> 
> >> Of course AFL execution is just a FRACTION of time needed to
> >> perform full-featured portfolio backtest. There are many other 
steps
> >> involved, namely:
> >> 
> >> PHASE 1:
> >> FOR EVERY SYMBOL UNDER TEST
> >> 
> >> A) preparing data
> >> This involves the following sub steps:
> >> a.1) reading the data from the disk (if not cached already) or 
> > requesting data from external plugin (many external sources are 
> >> pretty slow)
> >> a.2) time compressing in selected periodicty (say your database 
is 
> > 1-minute
> >> and your optimization periodicity is 5 minute)
> >> a.3) filtering for weekends/regular trading hours etc (if 
enabled)
> >> a.4) performing padding to reference symbol (if enabled)
> >> Note that steps a.2 and a.3 are done simultaneously (in one 
loop) 
> > for speed
> >> 
> >> B) AFL execution (that is what is reported as "AFL execution 
time").
> >> 
> >> b.1) setting up AFL engine - allocating and filling built in 
arrays 
> > (buyprice/sellprice/shortprice/coverprice/margindeposit/
> >> positionsize/open/high/low/close/openint/volume, etc....)
> >> b.2) actual execution
> >> b.3) cleanup (freeing allocated temporary memory used for 
execution)
> >> 
> >> C) collection / sorting / ranking of trading signals
> >> After buy/sell/short/cover arrays are known AmiBroker collects 
> > signals
> >> and ranks them according to position score. At this step also
> >> features like HoldMinBars, trading delays, various kinds of
> >> built-in stops, etc, etc.
> >> 
> >> 
> >> PHASE 2:
> >> ONCE FOR BACKTEST (or OPTIMIZATION STEP):
> >> D.1) Actual Porftfolio backtest (simplified a bit):
> >> 
> >> For EVERY BAR under test
> >> {
> >>   For EVERY SIGNAL in the sorted signal list
> >>   {
> >>       PROCESS SIGNAL (signal processing is complex as it handles 
> > features like multiple currencies and scaling, early exit penalty 
> >> fees)
> >>       UpdateEquity()
> >>  }
> >>  Calculate TRADE STATISTICS
> >>  Calculate EQUITY STATISTICS
> >>  Evaluate/Handle STOPS
> >> }
> >> 
> >> Note that if any Custom Backtester procedure is defined there is 
> > one more AFL execution
> >> that allows to control D.1.
> >> 
> >> D.2) Report output (generating HTML backtest report), generating 
> > MAE/MFE charts
> >> 
> >> All those non-afl-execution steps, account for roughly 1-2 
> > nanoseconds overhead per bar per symbol.
> >> 
> >> I would not say that 1-2 nanosecond overhead is "not-efficient" 
> > or "performance bottleneck"
> >> considering the amount of work fully featured portfolio 
backtester 
> > is doing.
> >> 
> >> If you do the math.
> >> 10 step optimization * 850 symbols * 20000 bars per symbol gives
> >> 
> >> 
> > 
======================================================================
> > ============
> >> You got 170 MILLION (1.7e8) bars to process. Multiply that by 1 
> > nanosecond (1e-9) and you will get 17 seconds.
> >> 
> > 
======================================================================
> > ============
> >> 
> >> Quite frankly it seems that you are comparing apples to oranges. 
If 
> > your own codes simply do not do steps
> >> A) C) and D)  it surely will run faster but any code that does 
less 
> > work will run faster.
> >> I am not sure what is the purpose of this thread. I am getting 
> > pretty tired trying to convince you
> >> that I did my job well.
> >> 
> >> Best regards,
> >> Tomasz Janeczko
> >> amibroker.com
> >> ----- Original Message ----- 
> >> From: "dloyer123" <dloyer123@>
> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> Sent: Wednesday, August 13, 2008 3:03 AM
> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> >> 
> >> 
> >> > The idea of using array processing for this problem rather 
than 
> > the
> >> > more traditional for/next loop was a really good idea.  That 
part 
> > of
> >> > the system is very fast at what it does and provides a great 
> > amount
> >> > of freedom and flexibility.
> >> >
> >> > However, consider the trivial system:
> >> >
> >> > Buy = 0;
> >> > Sell = 0;
> >> > Short = 0;
> >> > Cover = 0;
> >> > Optimize("val",0,1,10,1);
> >> >
> >> > Clearly the AFL engine is being invoked, but this could be 
> > considered
> >> > the fastest possible AFL code, or very close to it.  It's 
> > execution
> >> > time is not zero, but pretty darn close.  The "check alf" 
function
> >> > measures 0.3ms for 64,000 bars, for the "Optimize" statement.
> >> >
> >> > On my 3Gz Core 2, this system takes 18 seconds to backtest 
over a
> >> > portfolio of 850 symbols, 1 year, 5 minute bars.  This time 
does 
> > not
> >> > vary much with the size of the test window.  Since "Quick AFL" 
is
> >> > slected, there should be about 20k bars per symbol.
> >> >
> >> > Running optimize, takes roughly the same time as the backtest 
for
> >> > each and every pass, 18 seconds each and every time.  That 18 
> > seconds
> >> > can not be explained by the AFL code execution alone.  There 
is 
> > other
> >> > stuff being done that takes much, much, much longer.
> >> >
> >> > So, even if AFL execution runs in zero time, this is the limit 
of 
> > how
> >> > fast AmiBroker can optimize.
> >> >
> >> > So, yes, the AFL execution is highly optimized and very fast, 
but
> >> > there is a lot of overhead that is outside of the AFL 
execution.  
> > I
> >> > could guess what it is doing, but it really does not matter.
> >> >
> >> > I am only pointing out that it a very juicy opportunity for
> >> > meaningful performance gains.
> >> >
> >> > Yes, there are memory management issues, and yes, some data 
sets 
> > may
> >> > be too large to take advantage of it, but a large fraction of 
your
> >> > customer base would.  It could even be made transparent to the 
> > user
> >> > with no extra checkboxes.  No exotic hardware required.  It 
would
> >> > even work on a laptop.
> >> >
> >> > When I run my system on the emulator, it just runs on the 
normal 
> > cpu
> >> > core, using normal system memory.  If anything, there has to 
be a 
> > lot
> >> > of overhead in pretending to run as so many threads, and the 
data 
> > set
> >> > is far larger than the L2 cache.  But it is still much faster 
than
> >> > the built in backtest.  Yes, part of that is hand optimized 
code, 
> > but
> >> > that does not explain the performance differential or 20x to 
50x 
> > of
> >> > Ami vs emulator.  Running on the GPU is more like 4000x.  Yes 
the
> >> > GPU, has more memory bandwidth to work with, but not that much 
> > more.
> >> >
> >> > I would say that the AFL execution code is highly optimized 
and 
> > fully
> >> > exploits the hardware it has to work with, but that there are
> >> > performance bottlenecks elsewhere in the critical path. I can 
not
> >> > tell you what they are, but I would guess that it is 
rebuilding 
> > price
> >> > arrays and maybe other data structures on every pass.
> >> >
> >> > Anyway, I dont mean to tell you your business and you are much 
> > closer
> >> > to this problem than I am.  Maybe there is some edge case that 
I 
> > have
> >> > not considered that forces a performance hit.  It is still way 
> > faster
> >> > than EasyLanguage.
> >> >
> >> > I am a big fan of your work and enjoy using your product.  The
> >> > passion that you put into it shows.
> >> >
> >> >
> >> >
> >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@>
> >> > wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> What is true for GPU it is not necesarily true for CPU. GPU 
has
> >> > dedicated wide RAM
> >> >> bus and faster RAM as opposed to system memory.
> >> >>
> >> >> AmiBroker does a lot to utilise memory to maximum extent where
> >> > possible/feasible.
> >> >>
> >> >> Actually AFL speed is limited by system memory if you run out 
of 
> > on-
> >> > chip cache.
> >> >> http://www.amibroker.com/kb/2008/08/12/afl-execution-speed/
> >> >>
> >> >> So going for more memory usage not always means faster 
execution.
> >> >>
> >> >> Sure you can pre-compute everything, and use pre-computed 
values
> >> > but
> >> >> you need to understand that people are doing VERY different 
> > things
> >> > with AmiBroker
> >> >> and their problems are not the same as problems you are 
trying to
> >> > solve.
> >> >> For example some customers are backtesting entire US stock 
> > universe
> >> > (8000+ symbols)
> >> >> over 10 or 20 years. That's about 1.3GB for DATA alone. Now 
if 
> > you
> >> > are running
> >> >> porfolio backtest you need to keep trading signals and that 
can 
> > be
> >> > as much as 1GB in
> >> >> such case. Quickly you are reaching 3GB RAM limit of 32 OS. 
There
> >> > is no place
> >> >> to store "pre-computed" values.
> >> >> AmiBroker by nature needs to provide best blend of speed, 
> > moderate
> >> > memory / CPU requirements.
> >> >> User-specific single-task solutions may go into 
specialisation 
> > and
> >> > tricks that are
> >> >> not feasible for commercial general-purpose product that is
> >> > intended to keep
> >> >> large user base happy.
> >> >>
> >> >> Best regards,
> >> >> Tomasz Janeczko
> >> >> amibroker.com
> >> >> ----- Original Message ----- 
> >> >> From: "dloyer123" <dloyer123@>
> >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> >> Sent: Tuesday, August 12, 2008 4:09 PM
> >> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 
cores
> >> >>
> >> >>
> >> >> > The programing guide lists the 8600M and 8700M as having 32
> >> > computing
> >> >> > cores.  Not sure what they are clocked at.  Power is an 
issue.
> >> > The
> >> >> > desktop versions need dedicated power connectors.  The big 
> > cards
> >> > need
> >> >> > two.
> >> >> >
> >> >> > Actually, when I am doing development on my laptop, I just 
use
> >> > the
> >> >> > emulator.  It is about 100x slower than my desktop system, 
but
> >> > still
> >> >> > about 20x to 50x faster than Ami alone.  The speed 
difference 
> > in
> >> >> > emulation mode is mostly due to the precomputed and cached 
> > price
> >> >> > arrays.
> >> >> >
> >> >> > Tomasz:  I suspect that there is an opportunity to trade 
memory
> >> > for
> >> >> > speed, even with 1 core.  Memory is cheap and would be a 
> > simpler
> >> > way
> >> >> > to get a performance boost than porting to multi core, GPU 
or
> >> > CPU.
> >> >> >
> >> >> >
> >> >> >
> >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" 
<groups@>
> >> >> > wrote:
> >> >> >>
> >> >> >> Dell has 3 off the shelf
> >> >> >> > laptops in their entertainment/performance range that use
> >> > GeForce
> >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required 
for
> >> > CUDA?)
> >> >> >>
> >> >> >> Mobile ones are very poor cousins. Belive me. I own brand 
new
> >> >> > notebook (ASUS) with GeForce8600M
> >> >> >> and it is SLOW in 3D. I mean SLOW. Did I mention that it is
> >> > SLOW?
> >> >> >>
> >> >> >> In 3D Mark it gets the same results as my 3 year old 
desktop
> >> > 6600GT.
> >> >> >>
> >> >> >> Best regards,
> >> >> >> Tomasz Janeczko
> >> >> >> amibroker.com
> >> >> >> ----- Original Message ----- 
> >> >> >> From: "brian_z111" <brian_z111@>
> >> >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> >> >> Sent: Tuesday, August 12, 2008 12:40 AM
> >> >> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 
> > cores
> >> >> >>
> >> >> >>
> >> >> >> > DL
> >> >> >> >
> >> >> >> >
> >> >> >> > I am following at the top level and understand what you 
are
> >> > doing
> >> >> > OK
> >> >> >> > (you make me wish I had learnt programming/IT).
> >> >> >> >
> >> >> >> > I like your CPU.
> >> >> >> >
> >> >> >> > Allowing niche trading is what AB is all about?
> >> >> >> >
> >> >> >> > I'll put my money on MS/"general purpose computing on 
GPU" -
> > I
> >> >> > don't
> >> >> >> > think the masses are in love with MS but for 80% of 
people 
> > who
> >> >> > can do
> >> >> >> > 80% of what they want with MS the price to move 
elsewhere is
> >> > too
> >> >> >> > high - they are just in love with max output for min 
input.
> >> >> >> >
> >> >> >> > If you go to the trouble to write a plug-in do you think 
it
> >> > will
> >> >> > be
> >> >> >> > around long/require much ongoing support from you?
> >> >> >> >
> >> >> >> > I can see the benefits of the speed - for a group of 
traders
> >> > it
> >> >> > is a
> >> >> >> > definite edge they would have for a year or two (I don't 
> > think
> >> >> > any
> >> >> >> > other trading software will be seeing this for a while? -
> >> >> > especially
> >> >> >> > in the AT area where more crunching could be done fast 
> > enough
> >> > to
> >> >> > keep
> >> >> >> > up with live data.
> >> >> >> >
> >> >> >> > I don't blame Tomasz for not sitting his backside on the
> >> > cutting
> >> >> >> > edge - too dangerous for developers with long term 
> > clientele.
> >> >> >> >
> >> >> >> > Not having a go at Tomasz - to clarify - Tomeasz said 
> > GEForce
> >> >> > 8800
> >> >> >> > can't be put in a notebook?
> >> >> >> >
> >> >> >> > To my understanding there seems to be a reasonable 
number of
> >> >> > laptops
> >> >> >> > around that could use your method e.g. Dell has 3 off the
> >> > shelf
> >> >> >> > laptops in their entertainment/performance range that use
> >> > GeForce
> >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required 
for
> >> > CUDA?)
> >> >> >> >
> >> >> >> > I looked at the GeF links in Paul's post but they didn't 
> > have
> >> >> > much
> >> >> >> > specific info there that I could see - I assume the above
> >> > cards
> >> >> > wiil
> >> >> >> > run your system.
> >> >> >> >
> >> >> >> > I am not a buyer for now but good luck with it and what 
you
> >> > have
> >> >> > done
> >> >> >> > already is a good contribution to AB - once someone on 
the
> >> > block
> >> >> > has
> >> >> >> > a new super-dooper gadget pretty soon the neighbours 
want 
> > one
> >> > too
> >> >> > and
> >> >> >> > demand grows.
> >> >> >> >
> >> >> >> > brian_z
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" 
<dloyer123@>
> >> > wrote:
> >> >> >> >>
> >> >> >> >> This uses the mid range video card that happened to 
come 
> > with
> >> > my
> >> >> >> >> system, a 9800GT.  The newer 260 and 280 cards are 3 to 
4
> >> > times
> >> >> >> >> faster.  The 260 can be found at best buy for $300.  
Some
> >> >> > laptops
> >> >> >> >> have compatible cards as well.
> >> >> >> >>
> >> >> >> >> The video card has its own memory, mine has 512MB, some 
> > have
> >> > as
> >> >> >> > much
> >> >> >> >> as 1GB.  This memory is very fast, once it is loaded 
from 
> > the
> >> >> > main
> >> >> >> >> system.  Nvidia has a professional line of products 
that 
> > have
> >> >> > much
> >> >> >> >> more memory.
> >> >> >> >>
> >> >> >> >> Get get the best performance, my AFL code makes one 
pass 
> > over
> >> >> > the
> >> >> >> >> data, calling a Dll.  The Dll takes all of the data 
needed 
> > by
> >> >> > the
> >> >> >> >> calculation and loads a copy to the video card.  This 
> > upload
> >> > is
> >> >> >> > slow,
> >> >> >> >> the entire upload takes about 45 seconds for all 1000
> >> > symbols.
> >> >> >> >>
> >> >> >> >> Once all of the data is uploaded, the Dll loads 
a "kernel"
> >> > into
> >> >> > the
> >> >> >> >> graphics cores that perform the actual computation and
> >> > generates
> >> >> >> > the
> >> >> >> >> trade list.  This part is very fast and performs all of 
the
> >> > same
> >> >> >> >> functions that my AFL version does.  The resulting 
trade 
> > list
> >> > is
> >> >> >> > the
> >> >> >> >> same.
> >> >> >> >>
> >> >> >> >> Because the data loaded into video memory, it can be 
> > resused
> >> > for
> >> >> >> > many
> >> >> >> >> passes over the data with different optimization 
values.  
> > So,
> >> >> >> >> hundreds of combinations of optimization values can be 
> > tried
> >> > per
> >> >> >> >> second.
> >> >> >> >>
> >> >> >> >> For non optimization runs, the Dll just loads one 
symbol 
> > into
> >> >> > video
> >> >> >> >> memory and processes it.  Counting the overhead of 
moving
> >> > data
> >> >> > to
> >> >> >> > the
> >> >> >> >> video card and extracting the trade list for a single 
> > symbol,
> >> >> > the
> >> >> >> >> result is similar to AFL code alone.  This lets me test 
the
> >> > code
> >> >> >> > and
> >> >> >> >> make sure it is correct.
> >> >> >> >>
> >> >> >> >> This approach works best when the data only needs to be
> >> > loaded
> >> >> >> > once,
> >> >> >> >> then "resused" many times.  It also works best when 
there 
> > is
> >> > a
> >> >> > lot
> >> >> >> > of
> >> >> >> >> data to work with.
> >> >> >> >>
> >> >> >> >> What is more interesting to me and what would be more 
> > useful
> >> > for
> >> >> >> >> others would be a general drive that requires no Dll 
> > changes
> >> > to
> >> >> >> >> modify the system.  The performance would not be as 
good as
> >> > hand
> >> >> >> >> optimized code, but would still be much better than AFL 
> > code
> >> >> >> > alone.
> >> >> >> >> It would take trading system design to a whole new 
level.  
> > It
> >> >> > would
> >> >> >> >> provide enough performance to make working with Intra 
day
> >> > data
> >> >> > as
> >> >> >> >> easy as daily data is today.
> >> >> >> >>
> >> >> >> >> Writing such a driver would be hard, but I have already 
> > done
> >> >> > some
> >> >> >> >> prototypes and design work.  I am tempted to do it for 
my 
> > own
> >> >> > use.
> >> >> >> >> If I made it available to others supporting it would be 
a
> >> > PITA.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@>
> >> > wrote:
> >> >> >> >> >
> >> >> >> >> > I'm very interested
> >> >> >> >> > could you elaborate a bit more
> >> >> >> >> > What model of Nvidia chipset are you using, and with 
how
> >> > much
> >> >> >> >> memory?
> >> >> >> >> > Not sure exactly what you mean when you say
> >> >> >> >> > It uses AmiBroker to load the symbol data and perform
> >> >> >> > calculations
> >> >> >> >> > that do not depend on the optimization parameters. 
Once
> >> > loaded
> >> >> >> > into
> >> >> >> >> > video memory, repeated passes can be made with 
different
> >> >> >> >> parameters,
> >> >> >> >> > avoiding any overhead.
> >> >> >> >> > Can you give me some examples. I presume when your 
dll is
> >> >> > called.
> >> >> >> >> AB passes
> >> >> >> >> > one or more arrays of data belonging to 1 symbol, is 
that
> >> > true?
> >> >> >> >> > Not sure exactly what the rest mean either. How many
> >> > functions
> >> >> >> > are
> >> >> >> >> you
> >> >> >> >> > running in your dll, and what does each of the do?
> >> >> >> >> > Great of you to share your insight.
> >> >> >> >> > Cheers
> >> >> >> >> > Paul.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >   _____
> >> >> >> >> >
> >> >> >> >> > From: amibroker@xxxxxxxxxxxxxxx
> >> >> >> > [mailto:amibroker@xxxxxxxxxxxxxxx]
> >> >> >> >> On Behalf
> >> >> >> >> > Of dloyer123
> >> >> >> >> > Sent: Tuesday, 5 August 2008 9:19 AM
> >> >> >> >> > To: amibroker@xxxxxxxxxxxxxxx
> >> >> >> >> > Subject: [amibroker] Freakishly fast backtest using 
64 
> > cores
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Greetings,
> >> >> >> >> >
> >> >> >> >> > I ported part of my AFL backtest code to a plugin, 
that
> >> > takes
> >> >> >> >> > advantage of the graphics math cores on the video 
card 
> > that
> >> >> > are
> >> >> >> >> > normally used for 3d graphics.
> >> >> >> >> >
> >> >> >> >> > I was able to get a several thousand fold performance
> >> >> > improvement
> >> >> >> >> > over AFL code alone.
> >> >> >> >> >
> >> >> >> >> > My goal was to reduce the 25 seconds AFL code alone 
uses
> >> > for a
> >> >> >> >> single
> >> >> >> >> > portfolio level back test to less than 1 second, 
allowing
> >> >> > multi
> >> >> >> > day
> >> >> >> >> > optimization and walkforward runs to complete in a 
more
> >> >> >> > reasonable
> >> >> >> >> > time, and also just to see how fast I could get it to 
> > run.
> >> >> >> >> >
> >> >> >> >> > The backtest runs over 1 year of 5 minute bars for 
about
> >> > 1000
> >> >> >> >> > symbols. 1 year of data normally takes 25 seconds for
> >> >> > AmiBroker
> >> >> >> >> > alone, or 18 seconds for 6 months of data. A typical
> >> >> > optimization
> >> >> >> >> > run takes hundreds of these passes per walk forward 
step,
> >> >> > taking
> >> >> >> >> > hours.
> >> >> >> >> >
> >> >> >> >> > Using the Nvidia CUDA API, running on my mid range 
video
> >> > card.
> >> >> > It
> >> >> >> >> > was much faster. Much, much, much faster. How fast?
> >> >> >> >> >
> >> >> >> >> > It reduced the run time from 25s to... 4.4ms. That is 
> > more
> >> >> > than
> >> >> >> >> > 200/s!
> >> >> >> >> >
> >> >> >> >> > I didnt believe the timing when I saw it at first. 
So, I
> >> > put
> >> >> >> > 1,000
> >> >> >> >> > runs in a loop and sure enough, it ran 1,000 
iterations 
> > in
> >> >> > about
> >> >> >> > 4
> >> >> >> >> > 1/2 seconds. This far exceeded my gaol or 
expectations.
> >> >> >> >> >
> >> >> >> >> > The resulting trade list matches that obtained by the 
AFL
> >> >> > version
> >> >> >> >> of
> >> >> >> >> > this code.
> >> >> >> >> >
> >> >> >> >> > I estimate that it is processing 32GB of bar data/sec.
> >> >> >> >> >
> >> >> >> >> > Getting this to work at peak performance was tricky. 
Most
> >> > of
> >> >> > what
> >> >> >> > I
> >> >> >> >> > have learned about code optimization does not apply.
> >> >> >> >> >
> >> >> >> >> > It uses AmiBroker to load the symbol data and perform
> >> >> >> > calculations
> >> >> >> >> > that do not depend on the optimization parameters. 
Once
> >> > loaded
> >> >> >> > into
> >> >> >> >> > video memory, repeated passes can be made with 
different
> >> >> >> >> parameters,
> >> >> >> >> > avoiding any overhead.
> >> >> >> >> >
> >> >> >> >> > For non backtest/optimization runs, the code just 
> > evaluates
> >> >> > one
> >> >> >> >> > symbol and passes the data back to AmiBroker
> >> >> > buy/sell/short/cover
> >> >> >> >> > arrays, making it easy to test, validate and 
visualize 
> > the
> >> >> >> > trades.
> >> >> >> >> > There is very little performance gain in this case.
> >> >> >> >> >
> >> >> >> >> > There are problems, however. To run optimizations at 
peak
> >> >> > speed,
> >> >> >> > I
> >> >> >> >> > can not use AmiBroker to calculate the optimization 
goal
> >> >> >> > function.
> >> >> >> >> > So, I am in the process of writing code to match 
signals
> >> > and
> >> >> >> >> > calculate the portfolio fitness function. Once I do 
> > this, I
> >> >> > will
> >> >> >> > be
> >> >> >> >> > able to perform full optimizations and walk forwards 
at 3
> >> >> > orders
> >> >> >> > of
> >> >> >> >> > magnitude faster than is possible with AmiBroker 
alone.
> >> >> >> >> >
> >> >> >> >> > Also, this is not general purpose code. Changing the 
> > system
> >> >> > code
> >> >> >> >> > means changing a dll written in C. However, there is 
no
> >> > reason
> >> >> >> > that
> >> >> >> >> > this could not be made more general.
> >> >> >> >> >
> >> >> >> >> > I have made some prototypes of "Cuda" versions of 
basic 
> > AFL
> >> >> >> >> > functions. The idea is to queue the function calls 
into a
> >> >> >> >> definition
> >> >> >> >> > executed by a micro kernel running on the graphics 
cores.
> >> > The
> >> >> >> >> result
> >> >> >> >> > would be the ability to use the full power of the 
> > graphics
> >> >> > cores
> >> >> >> > by
> >> >> >> >> > modifying AFL code to use Cuda aware versions with no
> >> > changes
> >> >> > to
> >> >> >> > C
> >> >> >> >> > code. It would be an interesting, but big project.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > ------------------------------------
> >> >> >> >
> >> >> >> > Please note that this group is for discussion between 
users
> >> > only.
> >> >> >> >
> >> >> >> > To get support from AmiBroker please send an e-mail 
directly
> >> > to
> >> >> >> > SUPPORT {at} amibroker.com
> >> >> >> >
> >> >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check
> >> > DEVLOG:
> >> >> >> > http://www.amibroker.com/devlog/
> >> >> >> >
> >> >> >> > For other support material please check also:
> >> >> >> > http://www.amibroker.com/support.html
> >> >> >> > Yahoo! Groups Links
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > ------------------------------------
> >> >> >
> >> >> > Please note that this group is for discussion between users 
> > only.
> >> >> >
> >> >> > To get support from AmiBroker please send an e-mail 
directly to
> >> >> > SUPPORT {at} amibroker.com
> >> >> >
> >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check 
> > DEVLOG:
> >> >> > http://www.amibroker.com/devlog/
> >> >> >
> >> >> > For other support material please check also:
> >> >> > http://www.amibroker.com/support.html
> >> >> > Yahoo! Groups Links
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > ------------------------------------
> >> >
> >> > Please note that this group is for discussion between users 
only.
> >> >
> >> > To get support from AmiBroker please send an e-mail directly to
> >> > SUPPORT {at} amibroker.com
> >> >
> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check 
DEVLOG:
> >> > http://www.amibroker.com/devlog/
> >> >
> >> > For other support material please check also:
> >> > http://www.amibroker.com/support.html
> >> > Yahoo! Groups Links
> >> >
> >> >
> >> >
> >>
> > 
> > 
> > 
> > ------------------------------------
> > 
> > Please note that this group is for discussion between users only.
> > 
> > To get support from AmiBroker please send an e-mail directly to 
> > SUPPORT {at} amibroker.com
> > 
> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> > http://www.amibroker.com/devlog/
> > 
> > For other support material please check also:
> > http://www.amibroker.com/support.html
> > Yahoo! Groups Links
> > 
> > 
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/