Given TJ?s comments
about:
-
The amount of
memory utilized in processing symbols of data
-
Whether or not this
would fit in the L2 cache
-
The effect it would
have on optimizations when it didn?t
I finally got
around to running a little benchmark for Multi Core Optimization using the
program I wrote and posted ( MCO ) which I?ll be posting a new version of
shortly ?
These tests were
run under the following conditions:
-
A less than state
of the art laptop with
o
Core 2 Duo 1.86 Ghz
processor
o
2 MB of L2
Cache
-
Watch Lists of
symbols each of which
o
Contains the next
power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32, 64,
128, 256
o
Contains Symbols
containing ~5000 bars of data ?
Given the
above:
-
Each symbol should
require 160,000 bytes i.e. ~5,000 bars * 32 bytes per
bar
-
Loading more than
13 symbols should cause L2 cache misses to
occur
Results:
-
See the attached
data & chart
There are several
interesting things I find regarding the results
?
-
The ?dent? in the
curve looking left to right occurs right where you?d think it would, between
8 symbols and 16 symbols i.e. from the point at which all data can be loaded
to and accessed from the L2 cache to the point where it no longer can
?
-
The ?dent? occurs
in the same place running either one or two instances of
AB
-
The ?dent? while
clearly visible is hardly traumatic in terms of run
times
-
The relationship of
run times between running one and two instances of AB is consistent at 40%
savings in terms of run times regardless of the number of symbols.
-
This is also in
line when one looks at how much CPU is utilized when running one instance of
AB which on the test machine is typically in the 54 ? 60%
range.
I have a new toy
that I?ll be trying these benchmarks on again shortly i.e. a dual core 2 duo
quad 3.0 ghz ?