Given TJ’s comments
about:
-
The amount of
memory utilized in processing symbols of data
-
Whether or not this
would fit in the L2 cache
-
The effect it would
have on optimizations when it didn’t
I finally got
around to running a little benchmark for Multi Core Optimization using the
program I wrote and posted ( MCO ) which I’ll be posting a new version of
shortly …
These tests were
run under the following conditions:
-
A less than state
of the art laptop with
o
Core 2 Duo 1.86 Ghz
processor
o
2 MB of L2
Cache
-
Watch Lists of
symbols each of which
o
Contains the next
power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32, 64,
128, 256
o
Contains Symbols
containing ~5000 bars of data …
Given the
above:
-
Each symbol should
require 160,000 bytes i.e. ~5,000 bars * 32 bytes per
bar
-
Loading more than
13 symbols should cause L2 cache misses to
occur
Results:
-
See the attached
data & chart
There are several
interesting things I find regarding the results
…
-
The “dent” in the
curve looking left to right occurs right where you’d think it would, between
8 symbols and 16 symbols i.e. from the point at which all data can be loaded
to and accessed from the L2 cache to the point where it no longer can
…
-
The “dent” occurs
in the same place running either one or two instances of
AB
-
The “dent” while
clearly visible is hardly traumatic in terms of run
times
-
The relationship of
run times between running one and two instances of AB is consistent at 40%
savings in terms of run times regardless of the number of symbols.
-
This is also in
line when one looks at how much CPU is utilized when running one instance of
AB which on the test machine is typically in the 54 – 60%
range.
I have a new toy
that I’ll be trying these benchmarks on again shortly i.e. a dual core 2 duo
quad 3.0 ghz …