PureBytes Links
Trading Reference Links
|
The programing guide lists the 8600M and 8700M as having 32 computing
cores. Not sure what they are clocked at. Power is an issue. The
desktop versions need dedicated power connectors. The big cards need
two.
Actually, when I am doing development on my laptop, I just use the
emulator. It is about 100x slower than my desktop system, but still
about 20x to 50x faster than Ami alone. The speed difference in
emulation mode is mostly due to the precomputed and cached price
arrays.
Tomasz: I suspect that there is an opportunity to trade memory for
speed, even with 1 core. Memory is cheap and would be a simpler way
to get a performance boost than porting to multi core, GPU or CPU.
--- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@xxx>
wrote:
>
> Dell has 3 off the shelf
> > laptops in their entertainment/performance range that use GeForce
> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for CUDA?)
>
> Mobile ones are very poor cousins. Belive me. I own brand new
notebook (ASUS) with GeForce8600M
> and it is SLOW in 3D. I mean SLOW. Did I mention that it is SLOW?
>
> In 3D Mark it gets the same results as my 3 year old desktop 6600GT.
>
> Best regards,
> Tomasz Janeczko
> amibroker.com
> ----- Original Message -----
> From: "brian_z111" <brian_z111@xxx>
> To: <amibroker@xxxxxxxxxxxxxxx>
> Sent: Tuesday, August 12, 2008 12:40 AM
> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
>
>
> > DL
> >
> >
> > I am following at the top level and understand what you are doing
OK
> > (you make me wish I had learnt programming/IT).
> >
> > I like your CPU.
> >
> > Allowing niche trading is what AB is all about?
> >
> > I'll put my money on MS/"general purpose computing on GPU" - I
don't
> > think the masses are in love with MS but for 80% of people who
can do
> > 80% of what they want with MS the price to move elsewhere is too
> > high - they are just in love with max output for min input.
> >
> > If you go to the trouble to write a plug-in do you think it will
be
> > around long/require much ongoing support from you?
> >
> > I can see the benefits of the speed - for a group of traders it
is a
> > definite edge they would have for a year or two (I don't think
any
> > other trading software will be seeing this for a while? -
especially
> > in the AT area where more crunching could be done fast enough to
keep
> > up with live data.
> >
> > I don't blame Tomasz for not sitting his backside on the cutting
> > edge - too dangerous for developers with long term clientele.
> >
> > Not having a go at Tomasz - to clarify - Tomeasz said GEForce
8800
> > can't be put in a notebook?
> >
> > To my understanding there seems to be a reasonable number of
laptops
> > around that could use your method e.g. Dell has 3 off the shelf
> > laptops in their entertainment/performance range that use GeForce
> > 8600M and 8700M with 256MB & 2*2456MB (min 256 required for CUDA?)
> >
> > I looked at the GeF links in Paul's post but they didn't have
much
> > specific info there that I could see - I assume the above cards
wiil
> > run your system.
> >
> > I am not a buyer for now but good luck with it and what you have
done
> > already is a good contribution to AB - once someone on the block
has
> > a new super-dooper gadget pretty soon the neighbours want one too
and
> > demand grows.
> >
> > brian_z
> >
> >
> >
> > --- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" <dloyer123@> wrote:
> >>
> >> This uses the mid range video card that happened to come with my
> >> system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
> >> faster. The 260 can be found at best buy for $300. Some
laptops
> >> have compatible cards as well.
> >>
> >> The video card has its own memory, mine has 512MB, some have as
> > much
> >> as 1GB. This memory is very fast, once it is loaded from the
main
> >> system. Nvidia has a professional line of products that have
much
> >> more memory.
> >>
> >> Get get the best performance, my AFL code makes one pass over
the
> >> data, calling a Dll. The Dll takes all of the data needed by
the
> >> calculation and loads a copy to the video card. This upload is
> > slow,
> >> the entire upload takes about 45 seconds for all 1000 symbols.
> >>
> >> Once all of the data is uploaded, the Dll loads a "kernel" into
the
> >> graphics cores that perform the actual computation and generates
> > the
> >> trade list. This part is very fast and performs all of the same
> >> functions that my AFL version does. The resulting trade list is
> > the
> >> same.
> >>
> >> Because the data loaded into video memory, it can be resused for
> > many
> >> passes over the data with different optimization values. So,
> >> hundreds of combinations of optimization values can be tried per
> >> second.
> >>
> >> For non optimization runs, the Dll just loads one symbol into
video
> >> memory and processes it. Counting the overhead of moving data
to
> > the
> >> video card and extracting the trade list for a single symbol,
the
> >> result is similar to AFL code alone. This lets me test the code
> > and
> >> make sure it is correct.
> >>
> >> This approach works best when the data only needs to be loaded
> > once,
> >> then "resused" many times. It also works best when there is a
lot
> > of
> >> data to work with.
> >>
> >> What is more interesting to me and what would be more useful for
> >> others would be a general drive that requires no Dll changes to
> >> modify the system. The performance would not be as good as hand
> >> optimized code, but would still be much better than AFL code
> > alone.
> >> It would take trading system design to a whole new level. It
would
> >> provide enough performance to make working with Intra day data
as
> >> easy as daily data is today.
> >>
> >> Writing such a driver would be hard, but I have already done
some
> >> prototypes and design work. I am tempted to do it for my own
use.
> >> If I made it available to others supporting it would be a PITA.
> >>
> >>
> >>
> >>
> >> --- In amibroker@xxxxxxxxxxxxxxx, "Paul Ho" <paul.tsho@> wrote:
> >> >
> >> > I'm very interested
> >> > could you elaborate a bit more
> >> > What model of Nvidia chipset are you using, and with how much
> >> memory?
> >> > Not sure exactly what you mean when you say
> >> > It uses AmiBroker to load the symbol data and perform
> > calculations
> >> > that do not depend on the optimization parameters. Once loaded
> > into
> >> > video memory, repeated passes can be made with different
> >> parameters,
> >> > avoiding any overhead.
> >> > Can you give me some examples. I presume when your dll is
called.
> >> AB passes
> >> > one or more arrays of data belonging to 1 symbol, is that true?
> >> > Not sure exactly what the rest mean either. How many functions
> > are
> >> you
> >> > running in your dll, and what does each of the do?
> >> > Great of you to share your insight.
> >> > Cheers
> >> > Paul.
> >> >
> >> >
> >> >
> >> > _____
> >> >
> >> > From: amibroker@xxxxxxxxxxxxxxx
> > [mailto:amibroker@xxxxxxxxxxxxxxx]
> >> On Behalf
> >> > Of dloyer123
> >> > Sent: Tuesday, 5 August 2008 9:19 AM
> >> > To: amibroker@xxxxxxxxxxxxxxx
> >> > Subject: [amibroker] Freakishly fast backtest using 64 cores
> >> >
> >> >
> >> >
> >> > Greetings,
> >> >
> >> > I ported part of my AFL backtest code to a plugin, that takes
> >> > advantage of the graphics math cores on the video card that
are
> >> > normally used for 3d graphics.
> >> >
> >> > I was able to get a several thousand fold performance
improvement
> >> > over AFL code alone.
> >> >
> >> > My goal was to reduce the 25 seconds AFL code alone uses for a
> >> single
> >> > portfolio level back test to less than 1 second, allowing
multi
> > day
> >> > optimization and walkforward runs to complete in a more
> > reasonable
> >> > time, and also just to see how fast I could get it to run.
> >> >
> >> > The backtest runs over 1 year of 5 minute bars for about 1000
> >> > symbols. 1 year of data normally takes 25 seconds for
AmiBroker
> >> > alone, or 18 seconds for 6 months of data. A typical
optimization
> >> > run takes hundreds of these passes per walk forward step,
taking
> >> > hours.
> >> >
> >> > Using the Nvidia CUDA API, running on my mid range video card.
It
> >> > was much faster. Much, much, much faster. How fast?
> >> >
> >> > It reduced the run time from 25s to... 4.4ms. That is more
than
> >> > 200/s!
> >> >
> >> > I didnt believe the timing when I saw it at first. So, I put
> > 1,000
> >> > runs in a loop and sure enough, it ran 1,000 iterations in
about
> > 4
> >> > 1/2 seconds. This far exceeded my gaol or expectations.
> >> >
> >> > The resulting trade list matches that obtained by the AFL
version
> >> of
> >> > this code.
> >> >
> >> > I estimate that it is processing 32GB of bar data/sec.
> >> >
> >> > Getting this to work at peak performance was tricky. Most of
what
> > I
> >> > have learned about code optimization does not apply.
> >> >
> >> > It uses AmiBroker to load the symbol data and perform
> > calculations
> >> > that do not depend on the optimization parameters. Once loaded
> > into
> >> > video memory, repeated passes can be made with different
> >> parameters,
> >> > avoiding any overhead.
> >> >
> >> > For non backtest/optimization runs, the code just evaluates
one
> >> > symbol and passes the data back to AmiBroker
buy/sell/short/cover
> >> > arrays, making it easy to test, validate and visualize the
> > trades.
> >> > There is very little performance gain in this case.
> >> >
> >> > There are problems, however. To run optimizations at peak
speed,
> > I
> >> > can not use AmiBroker to calculate the optimization goal
> > function.
> >> > So, I am in the process of writing code to match signals and
> >> > calculate the portfolio fitness function. Once I do this, I
will
> > be
> >> > able to perform full optimizations and walk forwards at 3
orders
> > of
> >> > magnitude faster than is possible with AmiBroker alone.
> >> >
> >> > Also, this is not general purpose code. Changing the system
code
> >> > means changing a dll written in C. However, there is no reason
> > that
> >> > this could not be made more general.
> >> >
> >> > I have made some prototypes of "Cuda" versions of basic AFL
> >> > functions. The idea is to queue the function calls into a
> >> definition
> >> > executed by a micro kernel running on the graphics cores. The
> >> result
> >> > would be the ability to use the full power of the graphics
cores
> > by
> >> > modifying AFL code to use Cuda aware versions with no changes
to
> > C
> >> > code. It would be an interesting, but big project.
> >> >
> >>
> >
> >
> >
> > ------------------------------------
> >
> > Please note that this group is for discussion between users only.
> >
> > To get support from AmiBroker please send an e-mail directly to
> > SUPPORT {at} amibroker.com
> >
> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> > http://www.amibroker.com/devlog/
> >
> > For other support material please check also:
> > http://www.amibroker.com/support.html
> > Yahoo! Groups Links
> >
> >
> >
>
------------------------------------
Please note that this group is for discussion between users only.
To get support from AmiBroker please send an e-mail directly to
SUPPORT {at} amibroker.com
For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/
For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/amibroker/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/amibroker/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:amibroker-digest@xxxxxxxxxxxxxxx
mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx
<*> To unsubscribe from this group, send an email to:
amibroker-unsubscribe@xxxxxxxxxxxxxxx
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
|