[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Freakishly fast backtest using 64 cores



PureBytes Links

Trading Reference Links

Tomasz,

Thanks for explaining a bit more - it is quite clear now - I missed 
the importance of RAM the first time around.

I rotate computers routinely, in tandem with my wife, so I have to 
wait for a new one - plus I buy middle of the road to avoid cutting 
edge hassles - plus there is a time cost in changing over so all in 
all your policies suit me.

Robustness is in my top five for things I like about AB .... in fact 
I was only thinking the other day how sweet it is that every time I 
do my (late) upgrades I never have any trouble or have to do AB 
reinstalls - the only time I do an install is when I get a new 
computer.

So thanks once again for taking the software maintenance hassles out 
of trading and for carrying me up the computing mountain.......


..... now about that AFL book?     O:-)


brian_z




--- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@xxx> 
wrote:
>
> Hello,
> 
> No, it can't for one simple reason I pointed out earlier: not 
having enough RAM
> to keep all intermediate values in RAM during entire optimization 
(all steps).
> You don't seem to accept that fact
> that many people are running optimizations on more data than you,
> and that implementing your suggestion would be equivalent to saying 
to them:
> "alright now you need to upgrade to 64 bit Windows and buy 20 GB or 
RAM"
> 
> The fact that your tests fit into RAM that you have on your 
computer,
> does not necesarily means that 
> a) everyone has as much RAM as you
> and/or
> b) everyone has same problem to solve as you.
> 
> Unlike other softwares, AmiBroker tries to be "resonable" when it 
comes to memory
> usage, and thats why it allows to run tests that simply are out of 
capabilities
> of other softwares that don't care about memory usage.
> 
> Before you suggest implementing "special cases" for 
> when all data and all intermediate results could fit in RAM, yes I 
could do that, 
> but at the expense of higher risk of bugs, higher cost of 
maintenance of 
> code that has multiple "versions" inside. And I won't go that road.
> 
> When designing general-purpose software you need to find balance.
> 
> If you fail to find balance, you end up with Operating System that 
needs 2GB (see Vista)
> and uses 100-200MB of RAM for windows transparency effect (see 
DWM.EXE memory usage).
> I hear all the time that "memory is cheap". Maybe it is cheap but 
it does not mean
> that cost is zero and it does not mean that actually accessing 
gigabytes of RAM costs zero time.
> If someone thinks that it does not matter I suggest installing 
Windows 95 on current machine
> and see how fast it boots from cold restart. 
> Nowdays I see lots of programmers who don't care if user needs to 
download 30MB runtime
> (see all .NET apps) and all they have to say is "buy bigger hard 
disk". I don't agree with that kind of thinking. 
> 
> Best regards,
> Tomasz Janeczko
> amibroker.com
> ----- Original Message ----- 
> From: "dloyer123" <dloyer123@xxx>
> To: <amibroker@xxxxxxxxxxxxxxx>
> Sent: Wednesday, August 13, 2008 6:18 PM
> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> 
> 
> > TJ:  I am merely suggesting that some work could be shifted from 
once 
> > per optimization step to once per optimization run and that many 
of 
> > your customers would benefit.  
> > 
> > If you don't like the suggestion, fine.  But there is no reason 
to be 
> > a jerk about it.
> > 
> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@> 
> > wrote:
> >>
> >> Hello,
> >> 
> >> Of course AFL execution is just a FRACTION of time needed to
> >> perform full-featured portfolio backtest. There are many other 
steps
> >> involved, namely:
> >> 
> >> PHASE 1:
> >> FOR EVERY SYMBOL UNDER TEST
> >> 
> >> A) preparing data
> >> This involves the following sub steps:
> >> a.1) reading the data from the disk (if not cached already) or 
> > requesting data from external plugin (many external sources are 
> >> pretty slow)
> >> a.2) time compressing in selected periodicty (say your database 
is 
> > 1-minute
> >> and your optimization periodicity is 5 minute)
> >> a.3) filtering for weekends/regular trading hours etc (if 
enabled)
> >> a.4) performing padding to reference symbol (if enabled)
> >> Note that steps a.2 and a.3 are done simultaneously (in one 
loop) 
> > for speed
> >> 
> >> B) AFL execution (that is what is reported as "AFL execution 
time").
> >> 
> >> b.1) setting up AFL engine - allocating and filling built in 
arrays 
> > (buyprice/sellprice/shortprice/coverprice/margindeposit/
> >> positionsize/open/high/low/close/openint/volume, etc....)
> >> b.2) actual execution
> >> b.3) cleanup (freeing allocated temporary memory used for 
execution)
> >> 
> >> C) collection / sorting / ranking of trading signals
> >> After buy/sell/short/cover arrays are known AmiBroker collects 
> > signals
> >> and ranks them according to position score. At this step also
> >> features like HoldMinBars, trading delays, various kinds of
> >> built-in stops, etc, etc.
> >> 
> >> 
> >> PHASE 2:
> >> ONCE FOR BACKTEST (or OPTIMIZATION STEP):
> >> D.1) Actual Porftfolio backtest (simplified a bit):
> >> 
> >> For EVERY BAR under test
> >> {
> >>   For EVERY SIGNAL in the sorted signal list
> >>   {
> >>       PROCESS SIGNAL (signal processing is complex as it handles 
> > features like multiple currencies and scaling, early exit penalty 
> >> fees)
> >>       UpdateEquity()
> >>  }
> >>  Calculate TRADE STATISTICS
> >>  Calculate EQUITY STATISTICS
> >>  Evaluate/Handle STOPS
> >> }
> >> 
> >> Note that if any Custom Backtester procedure is defined there is 
> > one more AFL execution
> >> that allows to control D.1.
> >> 
> >> D.2) Report output (generating HTML backtest report), generating 
> > MAE/MFE charts
> >> 
> >> All those non-afl-execution steps, account for roughly 1-2 
> > nanoseconds overhead per bar per symbol.
> >> 
> >> I would not say that 1-2 nanosecond overhead is "not-efficient" 
> > or "performance bottleneck"
> >> considering the amount of work fully featured portfolio 
backtester 
> > is doing.
> >> 
> >> If you do the math.
> >> 10 step optimization * 850 symbols * 20000 bars per symbol gives
> >> 
> >> 
> > 
======================================================================
> > ============
> >> You got 170 MILLION (1.7e8) bars to process. Multiply that by 1 
> > nanosecond (1e-9) and you will get 17 seconds.
> >> 
> > 
======================================================================
> > ============
> >> 
> >> Quite frankly it seems that you are comparing apples to oranges. 
If 
> > your own codes simply do not do steps
> >> A) C) and D)  it surely will run faster but any code that does 
less 
> > work will run faster.
> >> I am not sure what is the purpose of this thread. I am getting 
> > pretty tired trying to convince you
> >> that I did my job well.
> >> 
> >> Best regards,
> >> Tomasz Janeczko
> >> amibroker.com
> >> ----- Original Message ----- 
> >> From: "dloyer123" <dloyer123@>
> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> Sent: Wednesday, August 13, 2008 3:03 AM
> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
> >> 
> >> 
> >> > The idea of using array processing for this problem rather 
than 
> > the
> >> > more traditional for/next loop was a really good idea.  That 
part 
> > of
> >> > the system is very fast at what it does and provides a great 
> > amount
> >> > of freedom and flexibility.
> >> >
> >> > However, consider the trivial system:
> >> >
> >> > Buy = 0;
> >> > Sell = 0;
> >> > Short = 0;
> >> > Cover = 0;
> >> > Optimize("val",0,1,10,1);
> >> >
> >> > Clearly the AFL engine is being invoked, but this could be 
> > considered
> >> > the fastest possible AFL code, or very close to it.  It's 
> > execution
> >> > time is not zero, but pretty darn close.  The "check alf" 
function
> >> > measures 0.3ms for 64,000 bars, for the "Optimize" statement.
> >> >
> >> > On my 3Gz Core 2, this system takes 18 seconds to backtest 
over a
> >> > portfolio of 850 symbols, 1 year, 5 minute bars.  This time 
does 
> > not
> >> > vary much with the size of the test window.  Since "Quick AFL" 
is
> >> > slected, there should be about 20k bars per symbol.
> >> >
> >> > Running optimize, takes roughly the same time as the backtest 
for
> >> > each and every pass, 18 seconds each and every time.  That 18 
> > seconds
> >> > can not be explained by the AFL code execution alone.  There 
is 
> > other
> >> > stuff being done that takes much, much, much longer.
> >> >
> >> > So, even if AFL execution runs in zero time, this is the limit 
of 
> > how
> >> > fast AmiBroker can optimize.
> >> >
> >> > So, yes, the AFL execution is highly optimized and very fast, 
but
> >> > there is a lot of overhead that is outside of the AFL 
execution.  
> > I
> >> > could guess what it is doing, but it really does not matter.
> >> >
> >> > I am only pointing out that it a very juicy opportunity for
> >> > meaningful performance gains.
> >> >
> >> > Yes, there are memory management issues, and yes, some data 
sets 
> > may
> >> > be too large to take advantage of it, but a large fraction of 
your
> >> > customer base would.  It could even be made transparent to the 
> > user
> >> > with no extra checkboxes.  No exotic hardware required.  It 
would
> >> > even work on a laptop.
> >> >
> >> > When I run my system on the emulator, it just runs on the 
normal 
> > cpu
> >> > core, using normal system memory.  If anything, there has to 
be a 
> > lot
> >> > of overhead in pretending to run as so many threads, and the 
data 
> > set
> >> > is far larger than the L2 cache.  But it is still much faster 
than
> >> > the built in backtest.  Yes, part of that is hand optimized 
code, 
> > but
> >> > that does not explain the performance differential or 20x to 
50x 
> > of
> >> > Ami vs emulator.  Running on the GPU is more like 4000x.  Yes 
the
> >> > GPU, has more memory bandwidth to work with, but not that much 
> > more.
> >> >
> >> > I would say that the AFL execution code is highly optimized 
and 
> > fully
> >> > exploits the hardware it has to work with, but that there are
> >> > performance bottlenecks elsewhere in the critical path. I can 
not
> >> > tell you what they are, but I would guess that it is 
rebuilding 
> > price
> >> > arrays and maybe other data structures on every pass.
> >> >
> >> > Anyway, I dont mean to tell you your business and you are much 
> > closer
> >> > to this problem than I am.  Maybe there is some edge case that 
I 
> > have
> >> > not considered that forces a performance hit.  It is still way 
> > faster
> >> > than EasyLanguage.
> >> >
> >> > I am a big fan of your work and enjoy using your product.  The
> >> > passion that you put into it shows.
> >> >
> >> >
> >> >
> >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@>
> >> > wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> What is true for GPU it is not necesarily true for CPU. GPU 
has
> >> > dedicated wide RAM
> >> >> bus and faster RAM as opposed to system memory.
> >> >>
> >> >> AmiBroker does a lot to utilise memory to maximum extent where
> >> > possible/feasible.
> >> >>
> >> >> Actually AFL speed is limited by system memory if you run out 
of 
> > on-
> >> > chip cache.
> >> >> http://www.amibroker.com/kb/2008/08/12/afl-execution-speed/
> >> >>
> >> >> So going for more memory usage not always means faster 
execution.
> >> >>
> >> >> Sure you can pre-compute everything, and use pre-computed 
values
> >> > but
> >> >> you need to understand that people are doing VERY different 
> > things
> >> > with AmiBroker
> >> >> and their problems are not the same as problems you are 
trying to
> >> > solve.
> >> >> For example some customers are backtesting entire US stock 
> > universe
> >> > (8000+ symbols)
> >> >> over 10 or 20 years. That's about 1.3GB for DATA alone. Now 
if 
> > you
> >> > are running
> >> >> porfolio backtest you need to keep trading signals and that 
can 
> > be
> >> > as much as 1GB in
> >> >> such case. Quickly you are reaching 3GB RAM limit of 32 OS. 
There
> >> > is no place
> >> >> to store "pre-computed" values.
> >> >> AmiBroker by nature needs to provide best blend of speed, 
> > moderate
> >> > memory / CPU requirements.
> >> >> User-specific single-task solutions may go into 
specialisation 
> > and
> >> > tricks that are
> >> >> not feasible for commercial general-purpose product that is
> >> > intended to keep
> >> >> large user base happy.
> >> >>
> >> >> Best regards,
> >> >> Tomasz Janeczko
> >> >> amibroker.com
> >> >> ----- Original Message ----- 
> >> >> From: "dloyer123" <dloyer123@>
> >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> >> Sent: Tuesday, August 12, 2008 4:09 PM
> >> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 
cores
> >> >>
> >> >>
> >> >> > The programing guide lists the 8600M and 8700M as having 32
> >> > computing
> >> >> > cores.  Not sure what they are clocked at.  Power is an 
issue.
> >> > The
> >> >> > desktop versions need dedicated power connectors.  The big 
> > cards
> >> > need
> >> >> > two.
> >> >> >
> >> >> > Actually, when I am doing development on my laptop, I just 
use
> >> > the
> >> >> > emulator.  It is about 100x slower than my desktop system, 
but
> >> > still
> >> >> > about 20x to 50x faster than Ami alone.  The speed 
difference 
> > in
> >> >> > emulation mode is mostly due to the precomputed and cached 
> > price
> >> >> > arrays.
> >> >> >
> >> >> > Tomasz:  I suspect that there is an opportunity to trade 
memory
> >> > for
> >> >> > speed, even with 1 core.  Memory is cheap and would be a 
> > simpler
> >> > way
> >> >> > to get a performance boost than porting to multi core, GPU 
or
> >> > CPU.
> >> >> >
> >> >> >
> >> >> >
> >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" 
<groups@>
> >> >> > wrote:
> >> >> >>
> >> >> >> Dell has 3 off the shelf
> >> >> >> > laptops in their entertainment/performance range that use
> >> > GeForce
> >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required 
for
> >> > CUDA?)
> >> >> >>
> >> >> >> Mobile ones are very poor cousins. Belive me. I own brand 
new
> >> >> > notebook (ASUS) with GeForce8600M
> >> >> >> and it is SLOW in 3D. I mean SLOW. Did I mention that it is
> >> > SLOW?
> >> >> >>
> >> >> >> In 3D Mark it gets the same results as my 3 year old 
desktop
> >> > 6600GT.
> >> >> >>
> >> >> >> Best regards,
> >> >> >> Tomasz Janeczko
> >> >> >> amibroker.com
> >> >> >> ----- Original Message ----- 
> >> >> >> From: "brian_z111" <brian_z111@>
> >> >> >> To: <amibroker@xxxxxxxxxxxxxxx>
> >> >> >> Sent: Tuesday, August 12, 2008 12:40 AM
> >> >> >> Subject: [amibroker] Re: Freakishly fast backtest using 64 
> > cores
> >> >> >>
> >> >> >>
> >> >> >> > DL
> >> >> >> >
> >> >> >> >
> >> >> >> > I am following at the top level and understand what you 
are
> >> > doing
> >> >> > OK
> >> >> >> > (you make me wish I had learnt programming/IT).
> >> >> >> >
> >> >> >> > I like your CPU.
> >> >> >> >
> >> >> >> > Allowing niche trading is what AB is all about?
> >> >> >> >
> >> >> >> > I'll put my money on MS/"general purpose computing on 
GPU" -
> > I
> >> >> > don't
> >> >> >> > think the masses are in love with MS but for 80% of 
people 
> > who
> >> >> > can do
> >> >> >> > 80% of what they want with MS the price to move 
elsewhere is
> >> > too
> >> >> >> > high - they are just in love with max output for min 
input.
> >> >> >> >
> >> >> >> > If you go to the trouble to write a plug-in do you think 
it
> >> > will
> >> >> > be
> >> >> >> > around long/require much ongoing support from you?
> >> >> >> >
> >> >> >> > I can see the benefits of the speed - for a group of 
traders
> >> > it
> >> >> > is a
> >> >> >> > definite edge they would have for a year or two (I don't 
> > think
> >> >> > any
> >> >> >> > other trading software will be seeing this for a while? -
> >> >> > especially
> >> >> >> > in the AT area where more crunching could be done fast 
> > enough
> >> > to
> >> >> > keep
> >> >> >> > up with live data.
> >> >> >> >
> >> >> >> > I don't blame Tomasz for not sitting his backside on the
> >> > cutting
> >> >> >> > edge - too dangerous for developers with long term 
> > clientele.
> >> >> >> >
> >> >> >> > Not having a go at Tomasz - to clarify - Tomeasz said 
> > GEForce
> >> >> > 8800
> >> >> >> > can't be put in a notebook?
> >> >> >> >
> >> >> >> > To my understanding there seems to be a reasonable 
number of
> >> >> > laptops
> >> >> >> > around that could use your method e.g. Dell has 3 off the
> >> > shelf
> >> >> >> > laptops in their entertainment/performance range that use
> >> > GeForce
> >> >> >> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required 
for
> >> > CUDA?)
> >> >> >> >
> >> >> >> > I looked at the GeF links in Paul's post but they didn't 
> > have
> >> >> > much
> >> >> >> > specific info there that I could see - I assume the above
> >> > cards
> >> >> > wiil
> >> >> >> > run your system.
> >> >> >> >
> >> >> >> > I am not a buyer for now but good luck with it and what 
you
> >> > have
> >> >> > done
> >> >> >> > already is a good contribution to AB - once someone on 
the
> >> > block
> >> >> > has
> >> >> >> > a new super-dooper gadget pretty soon the neighbours 
want 
> > one
> >> > too
> >> >> > and
> >> >> >> > demand grows.
> >> >> >> >
> >> >> >> > brian_z
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" 
<dloyer123@>
> >> > wrote:
> >> >> >> >>
> >> >> >> >> This uses the mid range video card that happened to 
come 
> > with
> >> > my
> >> >> >> >> system, a 9800GT.  The newer 260 and 280 cards are 3 to 
4
> >> > times
> >> >> >> >> faster.  The 260 can be found at best buy for $300.  
Some
> >> >> > laptops
> >> >> >> >> have compatible cards as well.
> >> >> >> >>
> >> >> >> >> The video card has its own memory, mine has 512MB, some 
> > have
> >> > as
> >> >> >> > much
> >> >> >> >> as 1GB.  This memory is very fast, once it is loaded 
from 
> > the
> >> >> > main
> >> >> >> >> system.  Nvidia has a professional line of products 
that 
> > have
> >> >> > much
> >> >> >> >> more memory.
> >> >> >> >>
> >> >> >> >> Get get the best performance, my AFL code makes one 
pass 
> > over
> >> >> > the
> >> >> >> >> data, calling a Dll.  The Dll takes all of the data 
needed 
> > by
> >> >> > the
> >> >> >> >> calculation and loads a copy to the video card.  This 
> > upload
> >> > is
> >> >> >> > slow,
> >> >> >> >> the entire upload takes about 45 seconds for all 1000
> >> > symbols.
> >> >> >> >>
> >> >> >> >> Once all of the data is uploaded, the Dll loads 
a "kernel"
> >> > into
> >> >> > the
> >> >> >> >> graphics cores that perform the actual computation and
> >> > generates
> >> >> >> > the
> >> >> >> >> trade list.  This part is very fast and performs all of 
the
> >> > same
> >> >> >> >> functions that my AFL version does.  The resulting 
trade 
> > list
> >> > is
> >> >> >> > the
> >> >> >> >> same.
> >> >> >> >>
> >> >> >> >> Because the data loaded into video memory, it can be 
> > resused
> >> > for
> >> >> >> > many
> >> >> >> >> passes over the data with different optimization 
values.  
> > So,
> >> >> >> >> hundreds of combinations of optimization values can be 
> > tried
> >> > per
> >> >> >> >> second.
> >> >> >> >>
> >> >> >> >> For non optimization runs, the Dll just loads one 
symbol 
> > into
> >> >> > video
> >> >> >> >> memory and processes it.  Counting the overhead of 
moving
> >> > data
> >> >> > to
> >> >> >> > the
> >> >> >> >> video card and extracting the trade list for a single 
> > symbol,
> >> >> > the
> >> >> >> >> result is similar to AFL code alone.  This lets me test 
the
> >> > code
> >> >> >> > and
> >> >> >> >> make sure it is correct.
> >> >> >> >>
> >> >> >> >> This approach works best when the data only needs to be
> >> > loaded
> >> >> >> > once,
> >> >> >> >> then "resused" many times.  It also works best when 
there 
> > is
> >> > a
> >> >> > lot
> >> >> >> > of
> >> >> >> >> data to work with.
> >> >> >> >>
> >> >> >> >> What is more interesting to me and what would be more 
> > useful
> >> > for
> >> >> >> >> others would be a general drive that requires no Dll 
> > changes
> >> > to
> >> >> >> >> modify the system.  The performance would not be as 
good as
> >> > hand
> >> >> >> >> optimized code, but would still be much better than AFL 
> > code
> >> >> >> > alone.
> >> >> >> >> It would take trading system design to a whole new 
level.  
> > It
> >> >> > would
> >> >> >> >> provide enough performance to make working with Intra 
day
> >> > data
> >> >> > as
> >> >> >> >> easy as daily data is today.
> >> >> >> >>
> >> >> >> >> Writing such a driver would be hard, but I have already 
> > done
> >> >> > some
> >> >> >> >> prototypes and design work.  I am tempted to do it for 
my 
> > own
> >> >> > use.
> >> >> >> >> If I made it available to others supporting it would be 
a
> >> > PITA.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@>
> >> > wrote:
> >> >> >> >> >
> >> >> >> >> > I'm very interested
> >> >> >> >> > could you elaborate a bit more
> >> >> >> >> > What model of Nvidia chipset are you using, and with 
how
> >> > much
> >> >> >> >> memory?
> >> >> >> >> > Not sure exactly what you mean when you say
> >> >> >> >> > It uses AmiBroker to load the symbol data and perform
> >> >> >> > calculations
> >> >> >> >> > that do not depend on the optimization parameters. 
Once
> >> > loaded
> >> >> >> > into
> >> >> >> >> > video memory, repeated passes can be made with 
different
> >> >> >> >> parameters,
> >> >> >> >> > avoiding any overhead.
> >> >> >> >> > Can you give me some examples. I presume when your 
dll is
> >> >> > called.
> >> >> >> >> AB passes
> >> >> >> >> > one or more arrays of data belonging to 1 symbol, is 
that
> >> > true?
> >> >> >> >> > Not sure exactly what the rest mean either. How many
> >> > functions
> >> >> >> > are
> >> >> >> >> you
> >> >> >> >> > running in your dll, and what does each of the do?
> >> >> >> >> > Great of you to share your insight.
> >> >> >> >> > Cheers
> >> >> >> >> > Paul.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >   _____
> >> >> >> >> >
> >> >> >> >> > From: amibroker@xxxxxxxxxxxxxxx
> >> >> >> > [mailto:amibroker@xxxxxxxxxxxxxxx]
> >> >> >> >> On Behalf
> >> >> >> >> > Of dloyer123
> >> >> >> >> > Sent: Tuesday, 5 August 2008 9:19 AM
> >> >> >> >> > To: amibroker@xxxxxxxxxxxxxxx
> >> >> >> >> > Subject: [amibroker] Freakishly fast backtest using 
64 
> > cores
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Greetings,
> >> >> >> >> >
> >> >> >> >> > I ported part of my AFL backtest code to a plugin, 
that
> >> > takes
> >> >> >> >> > advantage of the graphics math cores on the video 
card 
> > that
> >> >> > are
> >> >> >> >> > normally used for 3d graphics.
> >> >> >> >> >
> >> >> >> >> > I was able to get a several thousand fold performance
> >> >> > improvement
> >> >> >> >> > over AFL code alone.
> >> >> >> >> >
> >> >> >> >> > My goal was to reduce the 25 seconds AFL code alone 
uses
> >> > for a
> >> >> >> >> single
> >> >> >> >> > portfolio level back test to less than 1 second, 
allowing
> >> >> > multi
> >> >> >> > day
> >> >> >> >> > optimization and walkforward runs to complete in a 
more
> >> >> >> > reasonable
> >> >> >> >> > time, and also just to see how fast I could get it to 
> > run.
> >> >> >> >> >
> >> >> >> >> > The backtest runs over 1 year of 5 minute bars for 
about
> >> > 1000
> >> >> >> >> > symbols. 1 year of data normally takes 25 seconds for
> >> >> > AmiBroker
> >> >> >> >> > alone, or 18 seconds for 6 months of data. A typical
> >> >> > optimization
> >> >> >> >> > run takes hundreds of these passes per walk forward 
step,
> >> >> > taking
> >> >> >> >> > hours.
> >> >> >> >> >
> >> >> >> >> > Using the Nvidia CUDA API, running on my mid range 
video
> >> > card.
> >> >> > It
> >> >> >> >> > was much faster. Much, much, much faster. How fast?
> >> >> >> >> >
> >> >> >> >> > It reduced the run time from 25s to... 4.4ms. That is 
> > more
> >> >> > than
> >> >> >> >> > 200/s!
> >> >> >> >> >
> >> >> >> >> > I didnt believe the timing when I saw it at first. 
So, I
> >> > put
> >> >> >> > 1,000
> >> >> >> >> > runs in a loop and sure enough, it ran 1,000 
iterations 
> > in
> >> >> > about
> >> >> >> > 4
> >> >> >> >> > 1/2 seconds. This far exceeded my gaol or 
expectations.
> >> >> >> >> >
> >> >> >> >> > The resulting trade list matches that obtained by the 
AFL
> >> >> > version
> >> >> >> >> of
> >> >> >> >> > this code.
> >> >> >> >> >
> >> >> >> >> > I estimate that it is processing 32GB of bar data/sec.
> >> >> >> >> >
> >> >> >> >> > Getting this to work at peak performance was tricky. 
Most
> >> > of
> >> >> > what
> >> >> >> > I
> >> >> >> >> > have learned about code optimization does not apply.
> >> >> >> >> >
> >> >> >> >> > It uses AmiBroker to load the symbol data and perform
> >> >> >> > calculations
> >> >> >> >> > that do not depend on the optimization parameters. 
Once
> >> > loaded
> >> >> >> > into
> >> >> >> >> > video memory, repeated passes can be made with 
different
> >> >> >> >> parameters,
> >> >> >> >> > avoiding any overhead.
> >> >> >> >> >
> >> >> >> >> > For non backtest/optimization runs, the code just 
> > evaluates
> >> >> > one
> >> >> >> >> > symbol and passes the data back to AmiBroker
> >> >> > buy/sell/short/cover
> >> >> >> >> > arrays, making it easy to test, validate and 
visualize 
> > the
> >> >> >> > trades.
> >> >> >> >> > There is very little performance gain in this case.
> >> >> >> >> >
> >> >> >> >> > There are problems, however. To run optimizations at 
peak
> >> >> > speed,
> >> >> >> > I
> >> >> >> >> > can not use AmiBroker to calculate the optimization 
goal
> >> >> >> > function.
> >> >> >> >> > So, I am in the process of writing code to match 
signals
> >> > and
> >> >> >> >> > calculate the portfolio fitness function. Once I do 
> > this, I
> >> >> > will
> >> >> >> > be
> >> >> >> >> > able to perform full optimizations and walk forwards 
at 3
> >> >> > orders
> >> >> >> > of
> >> >> >> >> > magnitude faster than is possible with AmiBroker 
alone.
> >> >> >> >> >
> >> >> >> >> > Also, this is not general purpose code. Changing the 
> > system
> >> >> > code
> >> >> >> >> > means changing a dll written in C. However, there is 
no
> >> > reason
> >> >> >> > that
> >> >> >> >> > this could not be made more general.
> >> >> >> >> >
> >> >> >> >> > I have made some prototypes of "Cuda" versions of 
basic 
> > AFL
> >> >> >> >> > functions. The idea is to queue the function calls 
into a
> >> >> >> >> definition
> >> >> >> >> > executed by a micro kernel running on the graphics 
cores.
> >> > The
> >> >> >> >> result
> >> >> >> >> > would be the ability to use the full power of the 
> > graphics
> >> >> > cores
> >> >> >> > by
> >> >> >> >> > modifying AFL code to use Cuda aware versions with no
> >> > changes
> >> >> > to
> >> >> >> > C
> >> >> >> >> > code. It would be an interesting, but big project.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > ------------------------------------
> >> >> >> >
> >> >> >> > Please note that this group is for discussion between 
users
> >> > only.
> >> >> >> >
> >> >> >> > To get support from AmiBroker please send an e-mail 
directly
> >> > to
> >> >> >> > SUPPORT {at} amibroker.com
> >> >> >> >
> >> >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check
> >> > DEVLOG:
> >> >> >> > http://www.amibroker.com/devlog/
> >> >> >> >
> >> >> >> > For other support material please check also:
> >> >> >> > http://www.amibroker.com/support.html
> >> >> >> > Yahoo! Groups Links
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > ------------------------------------
> >> >> >
> >> >> > Please note that this group is for discussion between users 
> > only.
> >> >> >
> >> >> > To get support from AmiBroker please send an e-mail 
directly to
> >> >> > SUPPORT {at} amibroker.com
> >> >> >
> >> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check 
> > DEVLOG:
> >> >> > http://www.amibroker.com/devlog/
> >> >> >
> >> >> > For other support material please check also:
> >> >> > http://www.amibroker.com/support.html
> >> >> > Yahoo! Groups Links
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > ------------------------------------
> >> >
> >> > Please note that this group is for discussion between users 
only.
> >> >
> >> > To get support from AmiBroker please send an e-mail directly to
> >> > SUPPORT {at} amibroker.com
> >> >
> >> > For NEW RELEASE ANNOUNCEMENTS and other news always check 
DEVLOG:
> >> > http://www.amibroker.com/devlog/
> >> >
> >> > For other support material please check also:
> >> > http://www.amibroker.com/support.html
> >> > Yahoo! Groups Links
> >> >
> >> >
> >> >
> >>
> > 
> > 
> > 
> > ------------------------------------
> > 
> > Please note that this group is for discussion between users only.
> > 
> > To get support from AmiBroker please send an e-mail directly to 
> > SUPPORT {at} amibroker.com
> > 
> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> > http://www.amibroker.com/devlog/
> > 
> > For other support material please check also:
> > http://www.amibroker.com/support.html
> > Yahoo! Groups Links
> > 
> > 
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/