Re: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run Times, AmiBroker Email List Archive

Here are some results I got with my new toy ?

This is using a reasonably complex system on ~500 symbols over 10 years i.e. ~2500 bars ...

Cores Time Percent

1 218

2 114 52.29%

3 79 36.24%

4 62 28.44%

5 52 23.85%

6 46 21.10%

7 41 18.81%

8 37 16.97%

As expected the higher you go the more overhead there is ? but improvements like this are still well worth the effort ? Especially on a single box ?

From: amibroker@xxxxxxxxxps.com [mailto:amibroker@xxxxxxxxxps.com] On Behalf Of Steve Dugas
Sent: Saturday, June 14, 2008 7:00 PM
To: amibroker@xxxxxxxxxps.com
Subject: Re: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run Times

Very interesting Fred, thanks! This looks encouraging, at least for us EOD guys.

One thing I notice - at 32 tickers, it looks like the curve has "recovered" to what you might expect to see even if there was no dent at 16. And also, after 32 the curve seems to get a second wind, i.e. it "inverts" and the time per symbol decreases *more* rapidly as more tickers are added. What do you think might account for that? Is it just due to the log nature of the chart? Thanks!

Steve

----- Original Message -----

From: Fred Tonetti

To: amibroker@xxxxxxxxxps.com

Sent: Saturday, June 14, 2008 5:49 PM

Subject: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run Times

Given TJ?s comments about:

-          The amount of memory utilized in processing symbols of data

-          Whether or not this would fit in the L2 cache

-          The effect it would have on optimizations when it didn?t

I finally got around to running a little benchmark for Multi Core Optimization using the program I wrote and posted ( MCO ) which I?ll be posting a new version of shortly ?

These tests were run under the following conditions:

-          A less than state of the art laptop with

o        Core 2 Duo 1.86 Ghz processor

o        2 MB of L2 Cache

-          Watch Lists of symbols each of which

o        Contains the next power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256

o        Contains Symbols containing ~5000 bars of data ?

Given the above:

-          Each symbol should require 160,000 bytes i.e. ~5,000 bars * 32 bytes per bar

-          Loading more than 13 symbols should cause L2 cache misses to occur

Results:

-          See the attached data & chart

There are several interesting things I find regarding the results ?

-          The ?dent? in the curve looking left to right occurs right where you?d think it would, between 8 symbols and 16 symbols i.e. from the point at which all data can be loaded to and accessed from the L2 cache to the point where it no longer can ?

-          The ?dent? occurs in the same place running either one or two instances of AB

-          The ?dent? while clearly visible is hardly traumatic in terms of run times

-          The relationship of run times between running one and two instances of AB is consistent at 40% savings in terms of run times regardless of the number of symbols.

-          This is also in line when one looks at how much CPU is utilized when running one instance of AB which on the test machine is typically in the 54 ? 60% range.

I have a new toy that I?ll be trying these benchmarks on again shortly i.e. a dual core 2 duo quad 3.0 ghz ?

I am using the free version of SPAMfighter for private users.
It has removed 480 spam emails to date.
Paying users do not have this message in their emails.
Try SPAMfighter for free now!