Given TJ?s
comments about:
-
The amount of
memory utilized in processing symbols of data
-
Whether or not
this would fit in the L2 cache
-
The effect it
would have on optimizations when it
didn?t
I finally
got around to running a little benchmark for Multi Core Optimization using
the program I wrote and posted ( MCO ) which I?ll be posting a new version
of shortly ?
These
tests were run under the following
conditions:
-
A less than state
of the art laptop with
o
Core 2 Duo 1.86
Ghz processor
o
2 MB of L2
Cache
-
Watch Lists of
symbols each of which
o
Contains the next
power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32,
64, 128, 256
o
Contains Symbols
containing ~5000 bars of data ?
Given the
above:
-
Each symbol
should require 160,000 bytes i.e. ~5,000 bars * 32 bytes per
bar
-
Loading more than
13 symbols should cause L2 cache misses to
occur
Results:
-
See the attached
data & chart
There are
several interesting things I find regarding the results
?
-
The ?dent? in the
curve looking left to right occurs right where you?d think it would,
between 8 symbols and 16 symbols i.e. from the point at which all data can
be loaded to and accessed from the L2 cache to the point where it no
longer can ?
-
The ?dent? occurs
in the same place running either one or two instances of
AB
-
The ?dent? while
clearly visible is hardly traumatic in terms of run
times
-
The relationship
of run times between running one and two instances of AB is consistent at
40% savings in terms of run times regardless of the number of symbols.
-
This is also in
line when one looks at how much CPU is utilized when running one instance
of AB which on the test machine is typically in the 54 ? 60%
range.
I have a
new toy that I?ll be trying these benchmarks on again shortly i.e. a dual
core 2 duo quad 3.0 ghz ?