Hello,
I just run the same code on my relatively new notebook 
      (Core 2 Duo 2GHz (T7250))
and the loop takes less than 2ns per 
      iteration (3x speedup). So it looks like the data sits entirely inside the 
      cache. 
This core 2 has 2MB of cache and thats 4 times more than on 
      Athlon x2 I got.
> If what you say is true, and one core alone 
      fills the memory 
> bandwidth, then there should be a net loss of 
      performance while 
> running two copies of ami. 
It depends 
      on complexity of the formula and the amount of data per symbol
you are 
      using. As each array element has 4 bytes, to fill 4 MB of cache
you 
      would need 1 million array elements or 100 arrays each having 10000 
      elements
or 10 arrays each having 100K elements. Generally speaking 
      people testing
on EOD data where 10 years is just 2600 bars should see 
      speed up.
People using very very long intraday data sets may see 
      degradation, but
rather unnoticeable.
Best regards,
Tomasz 
      Janeczko
amibroker.com
----- Original Message ----- 
From: 
      "dloyer123" <dloyer123@xxxxxxcom>
To: 
      <amibroker@xxxxxxxxxps.com>
Sent: 
      Tuesday, May 13, 2008 8:12 PM
Subject: [amibroker] Re: Dual-core vs. 
      quad-core
> Nice, tight loop. It is good to see someone that has 
      made the effort 
> to make the most out of every cycle and the 
      result shows.
> 
> My new E8400 (45nm 3GHz, dual core) system 
      should arrive tomorrow. 
> The first thing I will do will be to 
      benchmark it running ami. I run 
> portfolio backtests over a few 
      years of 5 minute data over a thousand 
> or so symbols. Plenty of 
      data to overflow the cache, but still fit 
> in memory. No trig. 
      
> 
> I'll post what I find.
> 
> If what you say 
      is true, and one core alone fills the memory 
> bandwidth, then 
      there should be a net loss of performance while 
> running two 
      copies of ami. 
> 
> 
> 
> --- In amibroker@xxxxxxxxxps.com, 
      "Tomasz Janeczko" <groups@xxx> 
> 
      wrote:
>>
>> Hello,
>> 
>> FYI: SINGLE 
      processor core running an AFL formula is able to 
> saturate memory 
      bandwidth
>> in majority of most common 
      operations/functions
>> if total array sizes used in given 
      formula exceedes DATA cache size.
>> 
>> You need to 
      understand that AFL runs with native assembly speed
>> when using 
      array operations. 
>> A simple array multiplication like 
      this
>> 
>> X = Close * H; // array 
      multiplication
>> 
>> gets compiled to just 8 assembly 
      instructions:
>> 
>> loop: 8B 54 24 58 mov edx,dword ptr 
      [esp+58h]
>> 00465068 46 inc 
> esi ; increase counters 
      
>> 00465069 83 C0 04 add eax,4
>> 0046506C 3B F7 cmp 
      esi,edi
>> 0046506E D9 44 B2 FC fld dword ptr [edx+esi*4-
> 
      4] ; get element of close array
>> 00465072 D8 4C 08 FC fmul 
      dword ptr [eax+ecx-
> 4] ; multiply by element of high 
      array
>> 00465076 D9 58 FC fstp dword ptr [eax-
> 4] ; 
      store result
>> 00465079 7C E9 jl 
> loop ; continue until 
      all elements are processed 
>> 
>> As you can see there 
      are three 4 byte memory accesses per loop 
> iteration (2 reads each 
      4 bytes long and 1 write 4 byte long)
>> 
>> On my (2 
      year old) 2GHz Athlon x2 64 single iteration of this loop 
> takes 6 
      nanoseconds (see benchmark code below).
>> So, during 6 
      nanoseconds we have 8 byte reads and 4 byte store. 
> Thats 
      (8/(6e-9)) bytes per second = 1333 MB per second read
>> and 667 
      MB per second write simultaneously i.e. 2GB/sec combined !
>> 
      
>> Now if you look at memory benchmarks:
>> http://community.compuserve.com/n/docs/docDownload.aspx?webtag=ws-
> 
      pchardware&guid=6827f836-8c33-4063-aaf5-c93605dd1dc6
>> 
      you will see that 2GB/s is THE LIMIT of system memory speed on 
> 
      Athlon x64 (DDR2 dual channel)
>> And that's considering the fact 
      that Athlon has superior-to-intel 
> on-die integrated memory 
      controller (hypertransfer)
>> 
>> // benchmark code - 
      for accurrate results run it on LARGE arrays - 
> intraday database, 
      1-minute interval, 50K bars or more)
>> 
      GetPerformanceCounter(1); 
>> for(k = 0; k < 1000; k++ ) 
      X = C * H; 
>> "Time per single iteration 
      [s]="+1e-3*GetPerformanceCounter()/
> 
      (1000*BarCount); 
>> 
>> Only really complex 
      operations that use *lots* of FPU (floating 
> point) 
      cycles
>> such as trigonometric (sin/cos/tan) functions are slow 
      enough for 
> the memory
>> to keep up.
>> 
      
>> Of course one may say that I am using "old" processor, and 
      new 
> computers have faster RAM and that's true
>> but 
      processor speeds increase FASTER than bus speeds and the gap 
> 
      between processor and RAM
>> becomes larger and larger so with 
      newer CPUs the situation will be 
> worse, not better.
>> 
      
>> 
>> Best regards,
>> Tomasz 
      Janeczko
>> amibroker.com
>> ----- Original Message 
      ----- 
>> From: "dloyer123" 
      <dloyer123@x..>
>> To: <amibroker@xxxxxxxxxps.com>
>> 
      Sent: Tuesday, May 13, 2008 5:02 PM
>> Subject: [amibroker] Re: 
      Dual-core vs. quad-core
>> 
>> 
>> > All of 
      the cores have to share the same front bus and 
> northbridge. 
      
>> > The northbridge connects the cpu to memory and has 
      limited 
> bandwidth.
>> > 
>> > If several 
      cores are running memory hungry applications, the 
> front 
      
>> > buss will saturate.
>> > 
>> > 
      The L2 cache helps for most applications, but not if you are 
> 
      burning 
>> > through a few G of quote data. The L2 cache is 
      just 4-8MB.
>> > 
>> > The newer multi core 
      systems have much faster front buses and 
> that 
>> > 
      trend is likely to continue.
>> > 
>> > So, it 
      would be nice if AMI could support running multi cores, 
> even 
      
>> > if it was just running different optimization passes on 
      different 
>> > cores. That would saturate the front bus, but 
      take advantage of 
> all 
>> > of the memory bandwidth 
      you have. It would really help those 
> multi 
>> > day 
      walkforward runs.
>> > 
>> > 
>> > 
      
>> > --- In amibroker@xxxxxxxxxps.com, 
      "markhoff" <markhoff@> wrote:
>> >>
>> 
      >> 
>> >> If you have a runtime penalty when running 
      2 independent AB jobs 
> on 
>> > a
>> >> 
      Core Duo CPU it might be caused by too less memory (swapping to 
      
>> > disk)
>> >> or other tasks which are also 
      running (e.g. a web browser, audio
>> >> streamer or 
      whatever). You can check this with a process explorer
>> >> 
      which shows each tasks CPU utilisation. Similar, 4 AB jobs on a 
> 
      Core
>> >> Quad should have nearly no penalty in 
      runtime.
>> >> 
>> >> Tomasz stated that 
      multi-thread optimization does not scale good 
>> > 
      with
>> >> the CPU number, but it is not clear to me why 
      this is the case. 
> In 
>> > my
>> >> 
      understanding, AA optimization is a sequential process of 
> running 
      
>> > the
>> >> same AFL script with different 
      parameters. If I have an AFL with
>> >> significantly long 
      runtime per optimization step (e.g. 1 minute) 
> the
>> 
      >> overhead for the multi-threading should become quite small 
      and
>> >> independent tasks should scale nearly with the 
      number of CPUs 
> (as 
>> > long
>> >> as 
      there is sufficient memory, n threads might need n-times more
>> 
      >> memory than a single thread). For sure the situation is 
> 
      different if
>> >> my single optimization run takes only a 
      few millisecs or 
> seconds, 
>> > then
>> 
      >> the overhead for multi-thread-managment goes up 
      ...
>> >> 
>> >> Maybe Tomasz can give some 
      detailed comments on that issue?
>> >> 
>> 
      >> Best regards,
>> >> Markus
>> >> 
      
>> > 
>> > 
>> > 
      ------------------------------------
>> > 
      
>> > Please note that this group is for discussion between 
      users only.
>> > 
>> > To get support from 
      AmiBroker please send an e-mail directly to 
>> > SUPPORT {at} 
      amibroker.com
>> > 
>> > For NEW RELEASE 
      ANNOUNCEMENTS and other news always check DEVLOG:
>> > http://www.amibroker.com/devlog/
>> 
      > 
>> > For other support material please check 
      also:
>> > http://www.amibroker.com/support.html
>> 
      > Yahoo! Groups Links
>> > 
>> > 
>> 
      >
>>
> 
> 
> 
> 
      ------------------------------------
> 
> 
      Please note that this group is for discussion between users only.
> 
      
> To get support from AmiBroker please send an e-mail directly to 
      
> SUPPORT {at} amibroker.com
> 
> For NEW RELEASE 
      ANNOUNCEMENTS and other news always check DEVLOG:
> http://www.amibroker.com/devlog/
> 
      
> For other support material please check also:
> http://www.amibroker.com/support.html
> 
      Yahoo! Groups Links
> 
> 
>