[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TS calculation precision problem

To: Leslie Walko <l.walko@xxxxxxxxxxx>
Subject: Re: TS calculation precision problem
From: "Gary Fritz" <fritz@xxxxxxxx>
Date: Sun, 2 Feb 2003 21:50:23 -0800

PureBytes Links

Trading Reference Links

> Two questions:  did you have that stupid 'enhanced precision'
> feature turned ON in TS?  If so it actually REDUCES precision. 
> That could have been the problem.

Not necessary.  TS's limited precision, and the abysmally poor 
code in the library function, will do it all by itself.

Quick lesson in numerical precision follows, sorry if your eyes 
glaze over...

TS uses single-precision (32-bit) floats which have (I believe) a 
24-bit mantissa.  With 24 bits to represent a signed value, that 
means there are only about 7 digits of precision.  Anything 
beyond ~7 decimal digits is absolute fantasy produced by the 
decimal -> binary -> decimal rounding effects.

RSquared calls coeffR, which contains several "differences of 
large numbers:"

UpEQ = Summation(X * Y, Length) 
         - Length * Average(X, Length) * Average(Y, Length);
LowEQ1 = Summation(Square(X), Length) 
         - Length * Square(Average(X, Length));
LowEQ2 = Summation(Square(Y), Length) 
         - Length * Square(Average(Y, Length));

So let's say you're calculating a 20-bar RSquared.  RSquared 
calculates the Pearson's R (coeffR) of X and Y, where X = 
CurrentBar and Y = Close.  Say you're looking at S&P prices, and 
you have a 30-min chart that covers 60 days, most of one front 
contract.

At the end of that chart CurrentBar is around 700-800 or so.  S&P 
prices are also around 800-900 or so these days.  So each of 
those calculations above is roughly
  20*7xx*9xx - 20*7xx*9xx

Each of those terms is in the ballpark of 14,000,000 -- which, 
you might notice, is an 8-digit number.  That means that the last 
one or two digits are absolute garbage, caused by the loss of 
precision from representing an 8-digit decimal number in a 32-bit 
binary format.  If the *correct* values of the two large values 
are the same in the first 7-8 digits (which often happens when 
calculating Pearson's R), then each of the resulting TS 
calculations is also absolute garbage.  If the correct values of 
the two large values are the same in the first 5-6 digits, the 
resulting TS calculation has only 2-3 valid digits.

For example, if the correct values of the UpEq calculation were
  UpEq = 12345678.654321 - 12345678.123456
then the *correct* value of UpEq is 0.530865.  But the value 
calculated by TS will be total gibberish, since 12345678.654321 
might get rounded (in the binary representation of the term 
before the subtraction happens) to 12345679.584029, and 
12345678.123456 might get rounded to 12345677.896732.

If you take a longer RSquared, or if your chart has more bars on 
it, or if your price values are larger, the error gets worse.

The coeffR function was carelessly implemented by someone who 
obviously had no understanding of proper numerical methods.  They 
wrote it as if computer floating-point values had infinite 
precision; if they did, then that coeffR code would work fine.  
But in the real world of limited-precision variables, that code 
is WRONG.  It cannot possibly be accurate with 32-bit floats, and 
it wouldn't be that great with 64-bit floats.  It's just very bad 
code.

I only had one numerical-methods class 28 years ago, and I didn't 
pay much attention then. :-)  But even I know it's possible to 
calculate those values accurately, even with 32-bit floats, if 
you're a bit more careful.  E.g. instead of calculating 
Summation(X*Y,Length) [a large number] and subtracting Length * 
Average(X,Length)*Average(Y,Length) [another large number], you 
could do something like this:

  UpEq = 0;
  avgX_avgY = Average(X, Length) * Average(Y, Length);
  for i = 0 to length-1 begin
    UpEq = UpEq + (X[i] * Y[i]) - avgX_avgY;
  end;

That way you never get any large values, you never subtract two 
large values from each other, and so you lose very little 
precision.  You can do even better by running X from 1 to Length 
instead of from CurrentBar-Length+1 to CurrentBar, thus making 
the X*Y and avgX_avgY terms small no matter how many bars are on 
your chart.  The result is the same.  You can do this in plain 
old 32-bit TS code, without having to bother with a DLL.

But the folks who wrote the TS functions obviously had no 
understanding (or care) of little details like this.

Moral:  don't trust Tradestation numerical/statistical accuracy.

Gary

Follow-Ups:
- Re: TS calculation precision problem
  - From: Leslie Walko
- RE: TS calculation precision problem
  - From: Cab Vinton

References:
- Re: TS calculation precision problem
  - From: Leslie Walko

Prev by Date: Macro utility programs - which is the best?
Next by Date: Re: S&P 500 Advance Decline data
Previous by thread: Re: TS calculation precision problem
Next by thread: RE: TS calculation precision problem
Index(es):
- Date
- Thread