PureBytes Links
Trading Reference Links
|
Wow... is this why the second and subsequent runs of many optimization
problems seem so incredibly fast? (first run loads data into memory,
subsequent runs do not?)
Thanks
----- Original Message -----
From: "Tomasz Janeczko" <groups@xxxxxxxxxxxxx>
To: <amibroker@xxxxxxxxxxxxxxx>
Sent: Tuesday, August 12, 2008 3:40 PM
Subject: Re: [amibroker] Re: Freakishly fast backtest using 64 cores
> Hello,
>
> What is true for GPU it is not necesarily true for CPU. GPU has dedicated
> wide RAM
> bus and faster RAM as opposed to system memory.
>
> AmiBroker does a lot to utilise memory to maximum extent where
> possible/feasible.
>
> Actually AFL speed is limited by system memory if you run out of on-chip
> cache.
> http://www.amibroker.com/kb/2008/08/12/afl-execution-speed/
>
> So going for more memory usage not always means faster execution.
>
> Sure you can pre-compute everything, and use pre-computed values but
> you need to understand that people are doing VERY different things with
> AmiBroker
> and their problems are not the same as problems you are trying to solve.
> For example some customers are backtesting entire US stock universe (8000+
> symbols)
> over 10 or 20 years. That's about 1.3GB for DATA alone. Now if you are
> running
> porfolio backtest you need to keep trading signals and that can be as much
> as 1GB in
> such case. Quickly you are reaching 3GB RAM limit of 32 OS. There is no
> place
> to store "pre-computed" values.
> AmiBroker by nature needs to provide best blend of speed, moderate memory
> / CPU requirements.
> User-specific single-task solutions may go into specialisation and tricks
> that are
> not feasible for commercial general-purpose product that is intended to
> keep
> large user base happy.
>
> Best regards,
> Tomasz Janeczko
> amibroker.com
> ----- Original Message -----
> From: "dloyer123" <dloyer123@xxxxxxxxx>
> To: <amibroker@xxxxxxxxxxxxxxx>
> Sent: Tuesday, August 12, 2008 4:09 PM
> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
>
>
>> The programing guide lists the 8600M and 8700M as having 32 computing
>> cores. Not sure what they are clocked at. Power is an issue. The
>> desktop versions need dedicated power connectors. The big cards need
>> two.
>>
>> Actually, when I am doing development on my laptop, I just use the
>> emulator. It is about 100x slower than my desktop system, but still
>> about 20x to 50x faster than Ami alone. The speed difference in
>> emulation mode is mostly due to the precomputed and cached price
>> arrays.
>>
>> Tomasz: I suspect that there is an opportunity to trade memory for
>> speed, even with 1 core. Memory is cheap and would be a simpler way
>> to get a performance boost than porting to multi core, GPU or CPU.
>>
>>
>>
>> --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@xxx>
>> wrote:
>>>
>>> Dell has 3 off the shelf
>>> > laptops in their entertainment/performance range that use GeForce
>>> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for CUDA?)
>>>
>>> Mobile ones are very poor cousins. Belive me. I own brand new
>> notebook (ASUS) with GeForce8600M
>>> and it is SLOW in 3D. I mean SLOW. Did I mention that it is SLOW?
>>>
>>> In 3D Mark it gets the same results as my 3 year old desktop 6600GT.
>>>
>>> Best regards,
>>> Tomasz Janeczko
>>> amibroker.com
>>> ----- Original Message -----
>>> From: "brian_z111" <brian_z111@xxx>
>>> To: <amibroker@xxxxxxxxxxxxxxx>
>>> Sent: Tuesday, August 12, 2008 12:40 AM
>>> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
>>>
>>>
>>> > DL
>>> >
>>> >
>>> > I am following at the top level and understand what you are doing
>> OK
>>> > (you make me wish I had learnt programming/IT).
>>> >
>>> > I like your CPU.
>>> >
>>> > Allowing niche trading is what AB is all about?
>>> >
>>> > I'll put my money on MS/"general purpose computing on GPU" - I
>> don't
>>> > think the masses are in love with MS but for 80% of people who
>> can do
>>> > 80% of what they want with MS the price to move elsewhere is too
>>> > high - they are just in love with max output for min input.
>>> >
>>> > If you go to the trouble to write a plug-in do you think it will
>> be
>>> > around long/require much ongoing support from you?
>>> >
>>> > I can see the benefits of the speed - for a group of traders it
>> is a
>>> > definite edge they would have for a year or two (I don't think
>> any
>>> > other trading software will be seeing this for a while? -
>> especially
>>> > in the AT area where more crunching could be done fast enough to
>> keep
>>> > up with live data.
>>> >
>>> > I don't blame Tomasz for not sitting his backside on the cutting
>>> > edge - too dangerous for developers with long term clientele.
>>> >
>>> > Not having a go at Tomasz - to clarify - Tomeasz said GEForce
>> 8800
>>> > can't be put in a notebook?
>>> >
>>> > To my understanding there seems to be a reasonable number of
>> laptops
>>> > around that could use your method e.g. Dell has 3 off the shelf
>>> > laptops in their entertainment/performance range that use GeForce
>>> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for CUDA?)
>>> >
>>> > I looked at the GeF links in Paul's post but they didn't have
>> much
>>> > specific info there that I could see - I assume the above cards
>> wiil
>>> > run your system.
>>> >
>>> > I am not a buyer for now but good luck with it and what you have
>> done
>>> > already is a good contribution to AB - once someone on the block
>> has
>>> > a new super-dooper gadget pretty soon the neighbours want one too
>> and
>>> > demand grows.
>>> >
>>> > brian_z
>>> >
>>> >
>>> >
>>> > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" <dloyer123@> wrote:
>>> >>
>>> >> This uses the mid range video card that happened to come with my
>>> >> system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
>>> >> faster. The 260 can be found at best buy for $300. Some
>> laptops
>>> >> have compatible cards as well.
>>> >>
>>> >> The video card has its own memory, mine has 512MB, some have as
>>> > much
>>> >> as 1GB. This memory is very fast, once it is loaded from the
>> main
>>> >> system. Nvidia has a professional line of products that have
>> much
>>> >> more memory.
>>> >>
>>> >> Get get the best performance, my AFL code makes one pass over
>> the
>>> >> data, calling a Dll. The Dll takes all of the data needed by
>> the
>>> >> calculation and loads a copy to the video card. This upload is
>>> > slow,
>>> >> the entire upload takes about 45 seconds for all 1000 symbols.
>>> >>
>>> >> Once all of the data is uploaded, the Dll loads a "kernel" into
>> the
>>> >> graphics cores that perform the actual computation and generates
>>> > the
>>> >> trade list. This part is very fast and performs all of the same
>>> >> functions that my AFL version does. The resulting trade list is
>>> > the
>>> >> same.
>>> >>
>>> >> Because the data loaded into video memory, it can be resused for
>>> > many
>>> >> passes over the data with different optimization values. So,
>>> >> hundreds of combinations of optimization values can be tried per
>>> >> second.
>>> >>
>>> >> For non optimization runs, the Dll just loads one symbol into
>> video
>>> >> memory and processes it. Counting the overhead of moving data
>> to
>>> > the
>>> >> video card and extracting the trade list for a single symbol,
>> the
>>> >> result is similar to AFL code alone. This lets me test the code
>>> > and
>>> >> make sure it is correct.
>>> >>
>>> >> This approach works best when the data only needs to be loaded
>>> > once,
>>> >> then "resused" many times. It also works best when there is a
>> lot
>>> > of
>>> >> data to work with.
>>> >>
>>> >> What is more interesting to me and what would be more useful for
>>> >> others would be a general drive that requires no Dll changes to
>>> >> modify the system. The performance would not be as good as hand
>>> >> optimized code, but would still be much better than AFL code
>>> > alone.
>>> >> It would take trading system design to a whole new level. It
>> would
>>> >> provide enough performance to make working with Intra day data
>> as
>>> >> easy as daily data is today.
>>> >>
>>> >> Writing such a driver would be hard, but I have already done
>> some
>>> >> prototypes and design work. I am tempted to do it for my own
>> use.
>>> >> If I made it available to others supporting it would be a PITA.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@> wrote:
>>> >> >
>>> >> > I'm very interested
>>> >> > could you elaborate a bit more
>>> >> > What model of Nvidia chipset are you using, and with how much
>>> >> memory?
>>> >> > Not sure exactly what you mean when you say
>>> >> > It uses AmiBroker to load the symbol data and perform
>>> > calculations
>>> >> > that do not depend on the optimization parameters. Once loaded
>>> > into
>>> >> > video memory, repeated passes can be made with different
>>> >> parameters,
>>> >> > avoiding any overhead.
>>> >> > Can you give me some examples. I presume when your dll is
>> called.
>>> >> AB passes
>>> >> > one or more arrays of data belonging to 1 symbol, is that true?
>>> >> > Not sure exactly what the rest mean either. How many functions
>>> > are
>>> >> you
>>> >> > running in your dll, and what does each of the do?
>>> >> > Great of you to share your insight.
>>> >> > Cheers
>>> >> > Paul.
>>> >> >
>>> >> >
>>> >> >
>>> >> > _____
>>> >> >
>>> >> > From: amibroker@xxxxxxxxxxxxxxx
>>> > [mailto:amibroker@xxxxxxxxxxxxxxx]
>>> >> On Behalf
>>> >> > Of dloyer123
>>> >> > Sent: Tuesday, 5 August 2008 9:19 AM
>>> >> > To: amibroker@xxxxxxxxxxxxxxx
>>> >> > Subject: [amibroker] Freakishly fast backtest using 64 cores
>>> >> >
>>> >> >
>>> >> >
>>> >> > Greetings,
>>> >> >
>>> >> > I ported part of my AFL backtest code to a plugin, that takes
>>> >> > advantage of the graphics math cores on the video card that
>> are
>>> >> > normally used for 3d graphics.
>>> >> >
>>> >> > I was able to get a several thousand fold performance
>> improvement
>>> >> > over AFL code alone.
>>> >> >
>>> >> > My goal was to reduce the 25 seconds AFL code alone uses for a
>>> >> single
>>> >> > portfolio level back test to less than 1 second, allowing
>> multi
>>> > day
>>> >> > optimization and walkforward runs to complete in a more
>>> > reasonable
>>> >> > time, and also just to see how fast I could get it to run.
>>> >> >
>>> >> > The backtest runs over 1 year of 5 minute bars for about 1000
>>> >> > symbols. 1 year of data normally takes 25 seconds for
>> AmiBroker
>>> >> > alone, or 18 seconds for 6 months of data. A typical
>> optimization
>>> >> > run takes hundreds of these passes per walk forward step,
>> taking
>>> >> > hours.
>>> >> >
>>> >> > Using the Nvidia CUDA API, running on my mid range video card.
>> It
>>> >> > was much faster. Much, much, much faster. How fast?
>>> >> >
>>> >> > It reduced the run time from 25s to... 4.4ms. That is more
>> than
>>> >> > 200/s!
>>> >> >
>>> >> > I didnt believe the timing when I saw it at first. So, I put
>>> > 1,000
>>> >> > runs in a loop and sure enough, it ran 1,000 iterations in
>> about
>>> > 4
>>> >> > 1/2 seconds. This far exceeded my gaol or expectations.
>>> >> >
>>> >> > The resulting trade list matches that obtained by the AFL
>> version
>>> >> of
>>> >> > this code.
>>> >> >
>>> >> > I estimate that it is processing 32GB of bar data/sec.
>>> >> >
>>> >> > Getting this to work at peak performance was tricky. Most of
>> what
>>> > I
>>> >> > have learned about code optimization does not apply.
>>> >> >
>>> >> > It uses AmiBroker to load the symbol data and perform
>>> > calculations
>>> >> > that do not depend on the optimization parameters. Once loaded
>>> > into
>>> >> > video memory, repeated passes can be made with different
>>> >> parameters,
>>> >> > avoiding any overhead.
>>> >> >
>>> >> > For non backtest/optimization runs, the code just evaluates
>> one
>>> >> > symbol and passes the data back to AmiBroker
>> buy/sell/short/cover
>>> >> > arrays, making it easy to test, validate and visualize the
>>> > trades.
>>> >> > There is very little performance gain in this case.
>>> >> >
>>> >> > There are problems, however. To run optimizations at peak
>> speed,
>>> > I
>>> >> > can not use AmiBroker to calculate the optimization goal
>>> > function.
>>> >> > So, I am in the process of writing code to match signals and
>>> >> > calculate the portfolio fitness function. Once I do this, I
>> will
>>> > be
>>> >> > able to perform full optimizations and walk forwards at 3
>> orders
>>> > of
>>> >> > magnitude faster than is possible with AmiBroker alone.
>>> >> >
>>> >> > Also, this is not general purpose code. Changing the system
>> code
>>> >> > means changing a dll written in C. However, there is no reason
>>> > that
>>> >> > this could not be made more general.
>>> >> >
>>> >> > I have made some prototypes of "Cuda" versions of basic AFL
>>> >> > functions. The idea is to queue the function calls into a
>>> >> definition
>>> >> > executed by a micro kernel running on the graphics cores. The
>>> >> result
>>> >> > would be the ability to use the full power of the graphics
>> cores
>>> > by
>>> >> > modifying AFL code to use Cuda aware versions with no changes
>> to
>>> > C
>>> >> > code. It would be an interesting, but big project.
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > ------------------------------------
>>> >
>>> > Please note that this group is for discussion between users only.
>>> >
>>> > To get support from AmiBroker please send an e-mail directly to
>>> > SUPPORT {at} amibroker.com
>>> >
>>> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
>>> > http://www.amibroker.com/devlog/
>>> >
>>> > For other support material please check also:
>>> > http://www.amibroker.com/support.html
>>> > Yahoo! Groups Links
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> ------------------------------------
>>
>> Please note that this group is for discussion between users only.
>>
>> To get support from AmiBroker please send an e-mail directly to
>> SUPPORT {at} amibroker.com
>>
>> For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
>> http://www.amibroker.com/devlog/
>>
>> For other support material please check also:
>> http://www.amibroker.com/support.html
>> Yahoo! Groups Links
>>
>>
>>
>
> ------------------------------------
>
> Please note that this group is for discussion between users only.
>
> To get support from AmiBroker please send an e-mail directly to
> SUPPORT {at} amibroker.com
>
> For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> http://www.amibroker.com/devlog/
>
> For other support material please check also:
> http://www.amibroker.com/support.html
> Yahoo! Groups Links
>
>
>
------------------------------------
Please note that this group is for discussion between users only.
To get support from AmiBroker please send an e-mail directly to
SUPPORT {at} amibroker.com
For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/
For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/amibroker/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/amibroker/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:amibroker-digest@xxxxxxxxxxxxxxx
mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx
<*> To unsubscribe from this group, send an email to:
amibroker-unsubscribe@xxxxxxxxxxxxxxx
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
|