PureBytes Links
Trading Reference Links
|
@dloyer123: Yes, please let us know your benchmark results. Since I'm
also thinking about an hardware upgrade it would be helpfull to see
them ...
@Tomasz: thanks for your detailed explanations!
Best regards,
Markus
--- In amibroker@xxxxxxxxxxxxxxx, "dloyer123" <dloyer123@xxx> wrote:
>
> Nice, tight loop. It is good to see someone that has made the effort
> to make the most out of every cycle and the result shows.
>
> My new E8400 (45nm 3GHz, dual core) system should arrive tomorrow.
> The first thing I will do will be to benchmark it running ami. I run
> portfolio backtests over a few years of 5 minute data over a thousand
> or so symbols. Plenty of data to overflow the cache, but still fit
> in memory. No trig.
>
> I'll post what I find.
>
> If what you say is true, and one core alone fills the memory
> bandwidth, then there should be a net loss of performance while
> running two copies of ami.
>
>
>
> --- In amibroker@xxxxxxxxxxxxxxx, "Tomasz Janeczko" <groups@>
> wrote:
> >
> > Hello,
> >
> > FYI: SINGLE processor core running an AFL formula is able to
> saturate memory bandwidth
> > in majority of most common operations/functions
> > if total array sizes used in given formula exceedes DATA cache size.
> >
> > You need to understand that AFL runs with native assembly speed
> > when using array operations.
> > A simple array multiplication like this
> >
> > X = Close * H; // array multiplication
> >
> > gets compiled to just 8 assembly instructions:
> >
> > loop: 8B 54 24 58 mov edx,dword ptr [esp+58h]
> > 00465068 46 inc
> esi ; increase counters
> > 00465069 83 C0 04 add eax,4
> > 0046506C 3B F7 cmp esi,edi
> > 0046506E D9 44 B2 FC fld dword ptr [edx+esi*4-
> 4] ; get element of close array
> > 00465072 D8 4C 08 FC fmul dword ptr [eax+ecx-
> 4] ; multiply by element of high array
> > 00465076 D9 58 FC fstp dword ptr [eax-
> 4] ; store result
> > 00465079 7C E9 jl
> loop ; continue until all elements are processed
> >
> > As you can see there are three 4 byte memory accesses per loop
> iteration (2 reads each 4 bytes long and 1 write 4 byte long)
> >
> > On my (2 year old) 2GHz Athlon x2 64 single iteration of this loop
> takes 6 nanoseconds (see benchmark code below).
> > So, during 6 nanoseconds we have 8 byte reads and 4 byte store.
> Thats (8/(6e-9)) bytes per second = 1333 MB per second read
> > and 667 MB per second write simultaneously i.e. 2GB/sec combined !
> >
> > Now if you look at memory benchmarks:
> > http://community.compuserve.com/n/docs/docDownload.aspx?webtag=ws-
> pchardware&guid=6827f836-8c33-4063-aaf5-c93605dd1dc6
> > you will see that 2GB/s is THE LIMIT of system memory speed on
> Athlon x64 (DDR2 dual channel)
> > And that's considering the fact that Athlon has superior-to-intel
> on-die integrated memory controller (hypertransfer)
> >
> > // benchmark code - for accurrate results run it on LARGE arrays -
> intraday database, 1-minute interval, 50K bars or more)
> > GetPerformanceCounter(1);
> > for(k = 0; k < 1000; k++ ) X = C * H;
> > "Time per single iteration [s]="+1e-3*GetPerformanceCounter()/
> (1000*BarCount);
> >
> > Only really complex operations that use *lots* of FPU (floating
> point) cycles
> > such as trigonometric (sin/cos/tan) functions are slow enough for
> the memory
> > to keep up.
> >
> > Of course one may say that I am using "old" processor, and new
> computers have faster RAM and that's true
> > but processor speeds increase FASTER than bus speeds and the gap
> between processor and RAM
> > becomes larger and larger so with newer CPUs the situation will be
> worse, not better.
> >
> >
> > Best regards,
> > Tomasz Janeczko
> > amibroker.com
> > ----- Original Message -----
> > From: "dloyer123" <dloyer123@>
> > To: <amibroker@xxxxxxxxxxxxxxx>
> > Sent: Tuesday, May 13, 2008 5:02 PM
> > Subject: [amibroker] Re: Dual-core vs. quad-core
> >
> >
> > > All of the cores have to share the same front bus and
> northbridge.
> > > The northbridge connects the cpu to memory and has limited
> bandwidth.
> > >
> > > If several cores are running memory hungry applications, the
> front
> > > buss will saturate.
> > >
> > > The L2 cache helps for most applications, but not if you are
> burning
> > > through a few G of quote data. The L2 cache is just 4-8MB.
> > >
> > > The newer multi core systems have much faster front buses and
> that
> > > trend is likely to continue.
> > >
> > > So, it would be nice if AMI could support running multi cores,
> even
> > > if it was just running different optimization passes on different
> > > cores. That would saturate the front bus, but take advantage of
> all
> > > of the memory bandwidth you have. It would really help those
> multi
> > > day walkforward runs.
> > >
> > >
> > >
> > > --- In amibroker@xxxxxxxxxxxxxxx, "markhoff" <markhoff@> wrote:
> > >>
> > >>
> > >> If you have a runtime penalty when running 2 independent AB jobs
> on
> > > a
> > >> Core Duo CPU it might be caused by too less memory (swapping to
> > > disk)
> > >> or other tasks which are also running (e.g. a web browser, audio
> > >> streamer or whatever). You can check this with a process explorer
> > >> which shows each tasks CPU utilisation. Similar, 4 AB jobs on a
> Core
> > >> Quad should have nearly no penalty in runtime.
> > >>
> > >> Tomasz stated that multi-thread optimization does not scale good
> > > with
> > >> the CPU number, but it is not clear to me why this is the case.
> In
> > > my
> > >> understanding, AA optimization is a sequential process of
> running
> > > the
> > >> same AFL script with different parameters. If I have an AFL with
> > >> significantly long runtime per optimization step (e.g. 1 minute)
> the
> > >> overhead for the multi-threading should become quite small and
> > >> independent tasks should scale nearly with the number of CPUs
> (as
> > > long
> > >> as there is sufficient memory, n threads might need n-times more
> > >> memory than a single thread). For sure the situation is
> different if
> > >> my single optimization run takes only a few millisecs or
> seconds,
> > > then
> > >> the overhead for multi-thread-managment goes up ...
> > >>
> > >> Maybe Tomasz can give some detailed comments on that issue?
> > >>
> > >> Best regards,
> > >> Markus
> > >>
> > >
> > >
> > > ------------------------------------
> > >
> > > Please note that this group is for discussion between users only.
> > >
> > > To get support from AmiBroker please send an e-mail directly to
> > > SUPPORT {at} amibroker.com
> > >
> > > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> > > http://www.amibroker.com/devlog/
> > >
> > > For other support material please check also:
> > > http://www.amibroker.com/support.html
> > > Yahoo! Groups Links
> > >
> > >
> > >
> >
>
------------------------------------
Please note that this group is for discussion between users only.
To get support from AmiBroker please send an e-mail directly to
SUPPORT {at} amibroker.com
For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/
For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/amibroker/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/amibroker/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:amibroker-digest@xxxxxxxxxxxxxxx
mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx
<*> To unsubscribe from this group, send an email to:
amibroker-unsubscribe@xxxxxxxxxxxxxxx
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
|