[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to calculate STDev?



PureBytes Links

Trading Reference Links

Hopefully one *last* post on this topic...

The SumX/SumX2 non-array calculation of SD that I posted works, 
but you should be aware of a limitation with it.

The final SD calculation is
  SD = sqrt( (SumX2 - SumX^2/Length) / (Length-1) )

Note that this includes the term "SumX2 - SumX^2/Length".  This 
is usually a subtraction of two very close large numbers.

So what?  Remember that TS uses single-precision floats, which 
have a precision of roughly 7 decimal digits.  Anything beyond 
about 7 digits is fantasy resulting from the translation of the 
internal binary representation to the printed decimal value.  See 
http://www.purebytes.com/archives/omega/2003/msg00597.html for a 
more complete explanation.

I tried computing SD with my algorithm, with the TS StdDevS() 
function (which divides by Length-1 instead of Length, like my 
algorithm does and like Excel does), and with the Excel STDEV() 
function.  My algorithm is close to Excel's STDEV() and TS's 
StdDevS, but not exactly the same.  Why?  Because of the 
precision problem.

Example:  I calculated the SD of the Close of a day's worth of 
5min bars in the S&P.  That's 81 bars in the sample.  My SD's 
results were close to Excel's, but not identical:  3.67 vs. 
3.6797, 3.71 vs. 3.6966, 2.83 vs. 2.8236, etc.

Then I calculated the SumX2 and SumX^2/Length terms in Excel.  
For the first example I looked at, SumX2 = 56102168.640, and 
SumX^2/Length = 56101431.563.  The (correct) difference is 
737.077.  However, the first digit that differs in that sum is 
the 6th digit of the large numbers -- dangerously close to TS's 
limit of precision.  The "same" values, as calculated by TS, were 
SumX2 = 56102172.00, SumX^2/Length = 56101428.00, Diff =      
744.000.

Notice that TS's values for the sums differ from Excel's, and 
thus the Diff term is wrong.  TS calculated an SD of 
sqrt(744.000/(Length-1)), instead of sqrt(737.077/(Length-1)).  
That's why TS comes up with a slightly different answer.

Can you solve this?  Sometimes, yes, you can.  As I described in 
the archive posting referenced above, you can accumulate the 
"difference of two big numbers" term as you go along, avoiding 
the subtraction of two big numbers.

But I don't think you can do that with the SD calculation.  You 
could add in the SumX2 term, but the square(SumX)/Ntrades term 
presents a problem.  You might know Ntrades ahead of time -- 
let's say you always want to look at the SD of the last 20 
samples -- but how do you add in one piece of the square(SumX) 
term, without knowing the sum of all X's?  square(SumX)/Ntrades 
is eqivalent to SumX*Avg, so you could subtract the SumX term, 
but you don't know Avg until the end of the sample period.  I may 
be missing something, but I don't think there's any way to 
calculate the SD accurately, in a sample-by-sample sum-it-up-as-
you-go approach, given TS's precision limitations.  I think you 
have to do it after-the-fact, the way TS's functions do.

The built-in TS function, BTW, *is* accurate.  StdDevS divides by 
Length-1 so its calculation matches Excel's, AND it doesn't 
suffer from the precision problems that TS's coeffR function has. 
Instead of subtracting large numbers, is uses the "root mean 
square" algorithm, subtracting the average from each X value:

For Counter = 0 To Length - 1 Begin
  SumSqr = SumSqr+(Price[Counter]-Avg)*(Price[Counter]-Avg);
End;

The subtractions happen with small numbers -- Price-Avg -- so 
there is very little loss of precision.

So:  if you want to calculate your own SD, and precision is 
important to you, you shouldn't use my algorithm.  Use TS's 
StdDevS if you want to calculate the SD of a value on the last N 
bars, or StdDevS_a if you want to capture N widely-spaced events 
(like trade results) in an array.  Either one should match 
Excel's results very closely.

Gary