[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [amibroker] Timing test example between Arrays and Loops



PureBytes Links

Trading Reference Links

Tomasz,

Thank you for your complete explanation of the two situations -- if()  
vs Max() and time testing.

The timer situation makes more sense if the virtualization software  
periodically preempts the code.  The longer timed operations stand a  
bigger chance of being preempted and the timer will have that time  
added in more often, giving the bad results.  It also explains why the  
results seemed so jumpy which prompted me to add all the extra  
averaging of passes.  Now that I know the cause, I can try to fix it.

I rewrote the timing function.  Now it does a bunch of time tests and  
lets you choose a Min, Max, Average, or Median time from each, then  
averages a bunch of sets together in one pass.  I figured that if the  
preempting routines hit, they will always have the effect of adding a  
long time delay in the middle of one test.  However, the minimum time  
will never be one that is preempted.  Min and max will show the  
extremes in the data set.  Median will reject an outlier and do a bit  
of averaging.  Averaging will show the overall effect including  
Virtualization overhead.  By doing 10 tests and taking the median, I  
get a very consistent number.  Then averaging a bunch of those  
together (about 300ms of total test time), I get a very stable result.

I have snipped off the old AFL and appended the new routine to the end  
of this message again.  This program assumes that AB loads 200 more  
bars than specified.  If your setup loads a different number, then  
that parameter will have to be changed.  It issues a warning if the  
SBR and actual array lengths do not match each other due to the one  
way nature of SBR.

My new results show that the loop overhead time and looped operator  
time is now the same for 1,000 or 10,000 bars:
   1000 bars: 120us of overhead / 1000 bars in the example.
   1000 bars:    70us operator

The Array operator time changes from 1,000 to 20,000 bars but not as  
expected:
    1000 bars:    7us / 1000 bars median
*  2000 bars: 12us / 1000 bars -- stable but not understood, tried  
anti-alias delay to no effect, lowest number is also 12us.  Time  
should be 6us.
    3000 bars:    5us / 1000 bars
    4000 bars:    5us / 1000 bars
    5000 bars:    4us / 1000 bars
    8000 bars:    4us / 1000 bars -- this is 32us total operation time.

   -- Beyond this length, it think the virtualization may be always  
preempting the array operation once.  I also switched from 10 to 100  
filter cycles to make the filter more effective.

*  9000 bars:    15-20us / 1000 bars median
*  9000 bars:    14-18us / 1000 bars lowest
*  9000 bars:    21-41us / 1000 bars highest
*  9000 bars:    20us / 1000 bars average

Tested up to 20,000 bars is same result as 9000 bars.

I am running a 2.8GHz Quad core Mac Pro (2008 model) with XP running  
under Parallels and OS X and my broker charting and trading software  
written in Java, plus a Web Browser and bunch of other programs that  
don't tax the CPU much.  2 cores are dedicated to the XP machine and 2  
cores to the OS X machine.  However, which hardware core is assigned  
to which task is always changing.  This may indicate that the cache is  
getting dumped at an inopportune time.  Snow Leopard OS X is due out  
shortly with many multi-core improvements.  I will have to see how  
much that helps.

The AFL execution time at the bottom of the screen agrees with these  
high speed timer numbers as it would affect overall time.

Running virtualization software does cost me some speed compared to  
booting in XP, but I gain a lot of productivity and safety.  The claim  
is that I lose 15-20%.  These time tests would indicate that the  
number is highly dependent on the operation.  Very little time lost in  
looping operators as a percentage, but long array operations seem to  
get hit hard (5x).  I usually run 7000 bars for realtime trading,  
which is the sweet spot -- whew!

Best regards,
Dennis


On Jul 6, 2009, at 4:26 AM, Tomasz Janeczko wrote:

> Hello,
>
> Also, as to why if() statement inside loop is faster than calling Max:
> Well, people coming from "C" language tend to think that Max is a  
> macro.
> But it is not in AFL because it is array function and can't be  
> implemented
> as macro. Actually Max has different execution paths depending
> on arguments, so it is actually a virtual function that chooses  
> different
> code paths depending on arguments provided. If both are scalars,
> then Max is computed as scalar, if both are arrays then Max is  
> computed
> as array operator and if one is array and second is scalar then
> result is array but scalar has to be applied separately to each  
> element
> of the other argument.
>
> Now, your code calls Max() funciton the number of bars times
> and each call involves above parameter checking and call overhead
> regardless if new maximum is here or not.
>
> if() - version does the assignment only when new max is found.
> Since all arithmetic, logic and comparison operators are hand
> optimized the check alone (<) is very fast
> and assuming that new maximum occurs only rarely doing comparison
> alone means a lot less work than virtual function call.
>
> Best regards,
> Tomasz Janeczko
> amibroker.com
> ----- Original Message -----
> From: "Tomasz Janeczko" <groups@xxxxxxxxxxxxx>
> To: <amibroker@xxxxxxxxxxxxxxx>
> Sent: Monday, July 06, 2009 10:10 AM
> Subject: Re: [amibroker] Timing test example between Arrays and Loops
>
>
>> Hello,
>>
>> Yes, if you are using virtualization and run Windows XP on Mac,
>> you may not get correct measurements of time using Windows
>> High resolution timers.
>> They depend on hardware counter found on the CPU
>> and RTDSC CPU instruciton. If hardware is virtualized, then
>> measurements may be off.
>> Also if technologies like Cool&Quiet (AMD), SpeedStep (Intel) are
>> enabled that modify CPU clock in real-time the accurracy may be  
>> affected.
>> http://support.microsoft.com/?id=896256
>>
>>
>> I was trying your code on my end and initially it does not work
>> (it displays constantly "Warning -- ..... Edit AFL oto reset SBR or  
>> increase length parameter).
>> Of course neither worked.
>> So I changed that if statement that caused problem to
>> if ( ParamTrigger("Reset Averages", "Click to Reset" ) ) {
>>
>> On 127us for 10000 bars len and 125us for 1000 bars.
>> So the results are identical (this is on AMD 64x2 @2GHz).
>> I repeated the same test on my notebook (Intel Core 2) and again  
>> results are the same.
>>
>> Also array operator was much faster on my end (10us) for 12200  
>> loaded bars.
>>
>> So I can only say that Parallels is doing tricks with your timings,
>> and I would not trust any measurements done under virtualization  
>> software.
>>
>> I guess that Mac OS is doing preemption of virtual machine so it  
>> looses control of CPU
>> for given timeslice and that leads to higher timings than they  
>> really are
>> because client Windows does not "know" that it runs under  
>> virtualized hardware and thinks
>> that numbers reported by RTDSC are reliable while they are not  
>> because some time is spent
>> on host OS (MacOS) doing its job. And this is somewhat random  
>> because Windows client OS does
>> not know when it is preemptied.
>>
>> Best regards,
>> Tomasz Janeczko
>> amibroker.com
>> ----- Original Message -----
>> From: "Dennis Brown" <see3d@xxxxxxxxxxx>
>> To: <amibroker@xxxxxxxxxxxxxxx>
>> Sent: Sunday, July 05, 2009 5:46 AM
>> Subject: Re: [amibroker] Timing test example between Arrays and Loops
>>
>>
>>> Tomasz,
>>>
>>> I had a typo in the previous results.  The 1,000 and 10,000 cases  
>>> were
>>> swapped.
>>>
>>> I still can not explain the differences I am seeing.  I have tried
>>> rearranging my code in many ways and the differences remain about  
>>> the
>>> same, 2x more time for 10x shorter loop.  There might be some  
>>> overhead
>>> associated with the timer function itself in my system since I am
>>> running XP in a Parallels virtualization environment on a Mac.   
>>> There
>>> might be an emulated component that is making these numbers come out
>>> different than I expect and your results also.
>>>
>>> However, even though I am not certain about the absolute results,  
>>> on a
>>> relative basis I have already found an interesting speed  
>>> difference in
>>> 3 different approaches to the example problem, which I will explain
>>> for the benefit of others.
>>>
>>> I tried three different ways to do a Max() function in a loop (times
>>> are per 1000 bars over a 10,000 bar length):
>>>
>>> 1.  MaxPrice = Max( MaxPrice, C[i] ); // 585us
>>> 2.  MaxPrice = IIf( MaxPrice < C[i], C[i], MaxPrice ); // 860us
>>> 3.  if ( MaxPrice < C[i] ) { MaxPrice = C[i]; } // 100us
>>>
>>> formula 2 was 1.5 times slower than formula 1, which I would expect.
>>> formula 3 was 6 times faster than formula 1, which was a big  
>>> surprise.
>>>
>>> I was sure that a Max() function would be faster than an if control
>>> statement.  Obviously, my intuition was not as good as the facts
>>> proved.  On further thought though it makes sense that it should be
>>> faster because, just one comparison has to take place and no
>>> assignment most of the time.  So, it only has to do half the work --
>>> but 6 times faster?
>>>
>>> This example shows why it is important to have a good intuitive
>>> understanding of which AFL operations are fast or slow to code the
>>> fastest formula.
>>>
>>> 4. MaxPrice = LastValue( HHV( C, Length ) ); // 23us
>>>
>>> The array version of this solution in formula 4, blows away the  
>>> others
>>> by a factor of 2 to 20 depending on the overhead details -- as  
>>> long as
>>> all the bars are used in both cases.  You certainly know how to
>>> optimize the array operators.  Fantastic job!
>>>
>>> However, when the array has to use all the bars, and a loop only has
>>> to use 10% of the bars, then the loop could end up blowing away the
>>> array operations.  This is not the most common case though.  It  
>>> would
>>> be good if the problem was simplified to the extent that the array
>>> operations were always faster,  Then no thought would have to be  
>>> given
>>> to loops at all.  If the number of bars back can optionally be set  
>>> to
>>> a smaller number, arrays will always be the fastest solution by a
>>> mile.  Then loops would only be used when the logic is too complex  
>>> for
>>> built-in array operations. -- just thinking out loud.
>>>
>>> I made some minor improvements to my example code, so I am attaching
>>> it again at the end of this thread and snipping off the old one.
>>>
>>> Best regards,
>>> Dennis
>>>
>>>
>>> On Jul 4, 2009, at 3:31 PM, Dennis Brown wrote:
>>>
>>>> Tomasz,
>>>>
>>>> Thank you for looking at it.
>>>>
>>>> On my machine, I get the following results:
>>>>
>>>> 1,000 Base cycle length:
>>>> Loop Overhead = 181us / 1000 bars
>>>> Loop Operator = 576us / 1000 bars
>>>>
>>>> 10,000 Base cycle length:
>>>> Loop Overhead = 240us / 1000 bars
>>>> Loop Operator = 1024us / 1000 bars
>>>>
>>>> Best regards,
>>>> Dennis
>>>>
>>>> On Jul 4, 2009, at 3:05 PM, Tomasz Janeczko wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have run your code and results are pretty much the same in both
>>>>> settings.
>>>>>
>>>>> Best regards,
>>>>> Tomasz Janeczko
>>>>> amibroker.com
>>>>>
>>>>> Dennis Brown wrote:
>>>>>> Hello Tomasz,
>>>>>>
>>>>>> Below is some AFL I wrote to explore the speed difference between
>>>>>> Array and Loop operations.  I wanted to explore a number of
>>>>>> different
>>>>>> array operations to see if I could speed up my indicators.
>>>>>>
>>>>>> It shows the difference in speed for finding the highest value  
>>>>>> in a
>>>>>> whole array.
>>>>>> I am not sure my math is correct though, because one of the  
>>>>>> results
>>>>>> seem counter intuitive.
>>>>>>
>>>>>> For instance:
>>>>>>
>>>>>> I test a for() loop that does nothing in order to get the loop
>>>>>> overhead.  I subtract that number from the loop that is doing the
>>>>>> real
>>>>>> operation to get just the time to do the one operation.  I
>>>>>> subtracted
>>>>>> off the loop overhead because a loop will usually have a lot of
>>>>>> operations that spread out the overhead.  I do this for 1,000  
>>>>>> cycles
>>>>>> and for 10,000 cycles.
>>>>>>
>>>>>> I would expect the time to execute a single AFL line after
>>>>>> subtracting
>>>>>> off the overhead would be the same regardless of the number of
>>>>>> cycles.  However, the results say that it takes almost twice as  
>>>>>> long
>>>>>> to execute that operation in a 1,000 cycle loop than a 10,000  
>>>>>> cycle
>>>>>> loop.  Since this is not what I would expect, so I have to  
>>>>>> assume I
>>>>>> did something wrong.  I figured I should ask for a review before
>>>>>> making statements about how much faster arrays are.
>>>>>>
>>>>>> On the other hand, the array operation shows almost one third  
>>>>>> of the
>>>>>> time per bar when going from 1,000 to 10,000 bars, which is
>>>>>> reasonable.
>>>>>>
>>>>>> Am I missing something in my understanding, or am I just a  
>>>>>> lousy AFL
>>>>>> coder? :)
>>>>>>
>>>>>> Best regards,
>>>>>> Dennis
>>>>>>
>>>

////////////////////////////////////////////////////////////////////////////////////////
// Simple test code to show speed difference between loop and array  
operations -- 7/7/2009 Dennis Brown
// Edit this AFL for each comparison of equivelent operations
// Check both 1000 to 20,000 bar operations to derive the overhead
// Timing on one pass is jumpy on some machines, so it will filter  
over a number of tries.
// Watch out for differences in operation lengths due to one way  
nature of SetBarsRequired()
// Do the short cycle length first, then the long one, then edit this  
AFL with short one selected to reset SBR -- what a pain
// If I could switch to lower SBR numbers on the fly, I would display  
a complete curve with overheads
// Number of bars loaded seems to always be 200 more than the  
requested number with SBR
// If your extra bars is a different number, then edit
//
LoopText = ArrayText = ""; // init and make global

function LoopOperation( Length ) { /* Edit this function with the loop  
operator to be tested */
	GetPerformanceCounter( 1 ); /* reset the timer */
	MaxPrice = 0;
	MinPrice = 0;
	for ( i=0; i<Length; i++ ) {
		/* insert the loop operation(s) to be tested below */
		if ( MaxPrice < C[i] ) { MaxPrice = C[i]; }
//		MaxPrice = Max( MaxPrice, C[i] );
//		MaxPrice = IIf(  MaxPrice < C[i], C[i], MaxPrice );
	}
	time = GetPerformanceCounter();
	LoopText = "if ( MaxPrice < C[i] ) { MaxPrice = C[i];" ; /* duplicate  
operator here for display output */
	return round( (1000000/Length)*time ); /* returns time in  
microseconds per 1000 */
}

function ArrayOperation( Length ) { /* Edit this function with the  
array operator to be tested */
	GetPerformanceCounter( 1 );
	/* insert the array operation(s) to be tested below */
	MaxPrice = LastValue( HHV( C, Length ) );
	time = GetPerformanceCounter();
	ArrayText = "MaxPrice = LastValue( HHV( C, Length ) );" ; /*  
duplicate operator here for display output */
	return round( (1000000/Length)*time );
}

function LoopOverhead( Length ) {
	GetPerformanceCounter( True );
	for ( i=0; i<Length; i++ ) { ; }
	time = GetPerformanceCounter();
	return round( (1000000/Length)*time );
}

// main routine
Method = ParamList( "Time Filter", "Median|Lowest|Highest|Average", 0 );
InnerCycles = Param( ".  Time Filter Cycles ", 10, 3, 100, 1 );
SBRLength = Param( "Timed cycle length", 1000, 1000, 20000, 1000 );
extraBars = Param( ".  Extra bars loaded", 200, 0, 2000, 100 );
Length = BarCount - extraBars;

TimeSum0 = 0;
TimeSum1 = 0;
TimeSum2 = 0;
//InnerCycles = 10;
TimeCycles = round( (1000000/InnerCycles) / Length );

for ( j = 0; j < TimeCycles; j++ ) {
	for ( i = 1; I<=InnerCycles; i++ ) { // gather several time tests
		Time0A[BarCount-i] = LoopOverhead( Length ); // test loop overhead
		Time1A[BarCount-i] = LoopOperation( Length ); // test loop operator
		Time2A[BarCount-i] = ArrayOperation( Length ); // test array operator
	}
	if ( Method=="Median" ) {
		Time0 = LastValue( Median( Time0A, InnerCycles ) ); // ignore outliers
		Time1 = LastValue( Median( Time1A, InnerCycles ) );
		Time2 = LastValue( Median( Time2A, InnerCycles ) );
	}
	if ( Method=="Lowest" ) {
		Time0 = LastValue( LLV( Time0A, InnerCycles ) ); // ignore outliers
		Time1 = LastValue( LLV( Time1A, InnerCycles ) );
		Time2 = LastValue( LLV( Time2A, InnerCycles ) );
	}
	if ( Method=="Highest" ) {
		Time0 = LastValue( HHV( Time0A, InnerCycles ) ); // ignore outliers
		Time1 = LastValue( HHV( Time1A, InnerCycles ) );
		Time2 = LastValue( HHV( Time2A, InnerCycles ) );
	}
	if ( Method=="Average" ) {
		Time0 = LastValue( MA( Time0A, InnerCycles ) ); // ignore outliers
		Time1 = LastValue( MA( Time1A, InnerCycles ) );
		Time2 = LastValue( MA( Time2A, InnerCycles ) );
	}
	TimeSum0 += Time0 ;  // sum numbers
	TimeSum1 += Time1 ;
	TimeSum2 += Time2 ;
}
Time0 = round( TimeSum0 / TimeCycles );  // average numbers
Time1 = round( TimeSum1 / TimeCycles );
Time2 = round( TimeSum2 / TimeCycles );

OpTime =  Time1-Time0; // subtract off the constant loop overhead
speedup = NumToStr( OpTime /Time2, 0.1 );
speedup2 = NumToStr( Time1 /Time2, 0.1 );
if ( Length != SBRLength ) { warning = " !!! Warning Length  
Mismatch !!!\n\n" +
	" -- Make sure Extra bars loaded parameter matches you preferences\n" +
	" -- SetBarsRequired() can not reduce bars --  Edit & Save formula to  
reset Length parameter to lower number\n\n";
}
else { warning = ""; }

Title = "Per 1000 bars (cycles) Timing Test\n" +
	"Bars Loaded = " + BarCount + "\n" +
	"Base Cycle Length = " + Length + warning + "\n\n" +
	"Looped Operation:  " + LoopText +
	"\n\n" +
	"Loop Total Time  = " + time1  + "us\n" +
	"Loop Overhead    = " + time0  + "us\n" +
	"Looped Operator = " + OpTime + "us\n\n" +
	"Array Operation:   " + ArrayText +
	"\n\n" +
	"Array Operator          = " + time2 + "us\n" +
	"Total Speedup         = " + speedup2 + "\n" +
	"Operator Speedup  = " + speedup;

RequestTimedRefresh( 1 );
//Override all bar requirements for our test
SetBarsRequired( SBRLength, sbrAll );


------------------------------------

**** IMPORTANT PLEASE READ ****
This group is for the discussion between users only.
This is *NOT* technical support channel.

TO GET TECHNICAL SUPPORT send an e-mail directly to 
SUPPORT {at} amibroker.com

TO SUBMIT SUGGESTIONS please use FEEDBACK CENTER at
http://www.amibroker.com/feedback/
(submissions sent via other channels won't be considered)

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/