Given TJ’s comments about:
-
The amount of memory utilized in processing
symbols of data
-
Whether or not this would fit in the L2 cache
-
The effect it would have on optimizations when it
didn’t
I finally got around to running a little
benchmark for Multi Core Optimization using the program I wrote and posted (
MCO ) which I’ll be posting a new version of shortly …
These tests were run under the following
conditions:
-
A less than state of the art laptop with
o
Core 2 Duo 1.86 Ghz processor
-
Watch Lists of symbols each of which
o
Contains the next power of two number of symbols
of the previous i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256
o
Contains Symbols containing ~5000 bars of data …
-
Each symbol should require 160,000 bytes i.e.
~5,000 bars * 32 bytes per bar
-
Loading more than 13 symbols should cause L2
cache misses to occur
-
See the attached data & chart
There are several interesting things I
find regarding the results …
-
The “dent” in the curve looking left
to right occurs right where you’d think it would, between 8 symbols and
16 symbols i.e. from the point at which all data can be loaded to and accessed
from the L2 cache to the point where it no longer can …
-
The “dent” occurs in the same place
running either one or two instances of AB
-
The “dent” while clearly visible is
hardly traumatic in terms of run times
-
The relationship of run times between running one
and two instances of AB is consistent at 40% savings in terms of run times
regardless of the number of symbols.
-
This is also in line when one looks at how much
CPU is utilized when running one instance of AB which on the test machine is
typically in the 54 – 60% range.
I have a new toy that I’ll be trying
these benchmarks on again shortly i.e. a dual core 2 duo quad 3.0 ghz …