Re: [amibroker] Re: Multi Core Optimization, L2 Cache & Optimization Run Times, AmiBroker Email List Archive

Overhead is not a constant ... It is a function of a variety of
things not all of which am I even aware of and some of which can be
fairly significant ...

--- In amibroker@xxxxxxxxxps.com, "Ton Sieverding"
<ton.sieverding@...> wrote:
>
> Of course not. You'll always keep the overhead as a constant. But
as a rule of thumb it works fine for me in situations where time is
the bottleneck ...
>
> Regards, Ton.
>
> ----- Original Message -----
> From: Fred Tonetti
> To: amibroker@xxxxxxxxxps.com
> Sent: Wednesday, June 18, 2008 2:25 PM
> Subject: RE: [amibroker] Multi Core Optimization, L2 Cache &
Optimization Run Times
>
>
>
> The relationship isn't quite that clear .
>
>
>
> I'm still playing with this feature for IO but if you are using
AB's exhaustive search for a variety of things and have a multiple
CPU / Core machine try MCO on some of your optimization problems .
>
>
>
>
> ----------------------------------------------------------
----------
>
> From: amibroker@xxxxxxxxxps.com
[mailto:amibroker@xxxxxxxxxps.com] On Behalf Of Ton Sieverding
> Sent: Wednesday, June 18, 2008 4:29 AM
> To: amibroker@xxxxxxxxxps.com
> Subject: Re: [amibroker] Multi Core Optimization, L2 Cache &
Optimization Run Times
>
>
>
> Fred does this show me that 'doubling the cores equals halving
the time' -)
>
>
>
> Regards, Ton.
>
>
>
> ----- Original Message -----
>
> From: Fred Tonetti
>
> To: amibroker@xxxxxxxxxps.com
>
> Sent: Wednesday, June 18, 2008 1:10 AM
>
> Subject: RE: [amibroker] Multi Core Optimization, L2 Cache &
Optimization Run Times
>
>
>
> Here are some results I got with my new toy .
>
> This is using a reasonably complex system on ~500 symbols over
10 years i.e. ~2500 bars ...
>
> Cores Time Percent
>
> 1
218
>
> 2 114 52.29%
>
> 3 79 36.24%
>
> 4 62 28.44%
>
> 5 52 23.85%
>
> 6 46 21.10%
>
> 7 41 18.81%
>
> 8 37 16.97%
>
> As expected the higher you go the more overhead there is . but
improvements like this are still well worth the effort . Especially
on a single box .
>
>
> ----------------------------------------------------------
--------
>
> From: amibroker@xxxxxxxxxps.com
[mailto:amibroker@xxxxxxxxxps.com] On Behalf Of Steve Dugas
> Sent: Saturday, June 14, 2008 7:00 PM
> To: amibroker@xxxxxxxxxps.com
> Subject: Re: [amibroker] Multi Core Optimization, L2 Cache &
Optimization Run Times
>
> Very interesting Fred, thanks! This looks encouraging, at
least for us EOD guys.
>
> One thing I notice - at 32 tickers, it looks like the curve
has "recovered" to what you might expect to see even if there was no
dent at 16. And also, after 32 the curve seems to get a second wind,
i.e. it "inverts" and the time per symbol decreases *more* rapidly as
more tickers are added. What do you think might account for that? Is
it just due to the log nature of the chart? Thanks!
>
> Steve
>
> ----- Original Message -----
>
> From: Fred Tonetti
>
> To: amibroker@xxxxxxxxxps.com
>
> Sent: Saturday, June 14, 2008 5:49 PM
>
> Subject: [amibroker] Multi Core Optimization, L2 Cache &
Optimization Run Times
>
> Given TJ's comments about:
>
> - The amount of memory utilized in processing
symbols of data
>
> - Whether or not this would fit in the L2 cache
>
> - The effect it would have on optimizations when it
didn't
>
> I finally got around to running a little benchmark for Multi
Core Optimization using the program I wrote and posted ( MCO ) which
I'll be posting a new version of shortly .
>
> These tests were run under the following conditions:
>
> - A less than state of the art laptop with
>
> o Core 2 Duo 1.86 Ghz processor
>
> o 2 MB of L2 Cache
>
> - Watch Lists of symbols each of which
>
> o Contains the next power of two number of symbols of
the previous i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256
>
> o Contains Symbols containing ~5000 bars of data .
>
> Given the above:
>
> - Each symbol should require 160,000 bytes i.e.
~5,000 bars * 32 bytes per bar
>
> - Loading more than 13 symbols should cause L2 cache
misses to occur
>
> Results:
>
> - See the attached data & chart
>
> There are several interesting things I find regarding the
results .
>
> - The "dent" in the curve looking left to right
occurs right where you'd think it would, between 8 symbols and 16
symbols i.e. from the point at which all data can be loaded to and
accessed from the L2 cache to the point where it no longer can .
>
> - The "dent" occurs in the same place running either
one or two instances of AB
>
> - The "dent" while clearly visible is hardly
traumatic in terms of run times
>
> - The relationship of run times between running one
and two instances of AB is consistent at 40% savings in terms of run
times regardless of the number of symbols.
>
> - This is also in line when one looks at how much
CPU is utilized when running one instance of AB which on the test
machine is typically in the 54 - 60% range.
>
> I have a new toy that I'll be trying these benchmarks on
again shortly i.e. a dual core 2 duo quad 3.0 ghz .
>
>
>
>
> ----------------------------------------------------------
--------
>
> I am using the free version of SPAMfighter for private users.
> It has removed 480 spam emails to date.
> Paying users do not have this message in their emails.
> Try SPAMfighter for free now!
>
>
>
>
>
>
> ----------------------------------------------------------
----------
> I am using the free version of SPAMfighter for private users.
> It has removed 480 spam emails to date.
> Paying users do not have this message in their emails.
> Try SPAMfighter for free now!
>