Given TJ’s comments about:
-
The
amount of memory utilized in processing symbols of data
-
Whether
or not this would fit in the L2 cache
-
The
effect it would have on optimizations when it didn’t
I finally got around to running a little benchmark
for Multi Core Optimization using the program I wrote and posted ( MCO ) which
I’ll be posting a new version of shortly …
These tests were run under the following
conditions:
-
A less
than state of the art laptop with
o
Core 2
Duo 1.86 Ghz processor
o
2 MB of L2
Cache
-
Watch
Lists of symbols each of which
o
Contains
the next power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32,
64, 128, 256
o
Contains
Symbols containing ~5000 bars of data …
Given the above:
-
Each
symbol should require 160,000 bytes i.e. ~5,000 bars * 32 bytes per bar
-
Loading
more than 13 symbols should cause L2 cache misses to occur
Results:
-
See the attached
data & chart
There are several interesting things I
find regarding the results …
-
The “dent”
in the curve looking left to right occurs right where you’d think it
would, between 8 symbols and 16 symbols i.e. from the point at which all data
can be loaded to and accessed from the L2 cache to the point where it no longer
can …
-
The “dent”
occurs in the same place running either one or two instances of AB
-
The “dent”
while clearly visible is hardly traumatic in terms of run times
-
The
relationship of run times between running one and two instances of AB is
consistent at 40% savings in terms of run times regardless of the number of
symbols.
-
This is
also in line when one looks at how much CPU is utilized when running one
instance of AB which on the test machine is typically in the 54 – 60% range.
I have a new toy that I’ll be trying
these benchmarks on again shortly i.e. a dual core 2 duo quad 3.0 ghz …