[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TS calculation precision problem



PureBytes Links

Trading Reference Links

Gary:

Thank you for the wonderful explanation.  Your analysis and
examples are splendid!  

The TS numerical algorithms I have looked at selectively, made me
avoid them as much as possible.  However, I had no idea of what
the overall quality was.   Thank you for opening my eyes to the
horrible details.

Leslie



Gary Fritz wrote:
> 
> > Two questions:  did you have that stupid 'enhanced precision'
> > feature turned ON in TS?  If so it actually REDUCES precision.
> > That could have been the problem.
> 
> Not necessary.  TS's limited precision, and the abysmally poor
> code in the library function, will do it all by itself.
> 
> Quick lesson in numerical precision follows, sorry if your eyes
> glaze over...
> 
> TS uses single-precision (32-bit) floats which have (I believe) a
> 24-bit mantissa.  With 24 bits to represent a signed value, that
> means there are only about 7 digits of precision.  Anything
> beyond ~7 decimal digits is absolute fantasy produced by the
> decimal -> binary -> decimal rounding effects.
> 
> RSquared calls coeffR, which contains several "differences of
> large numbers:"
> 
> UpEQ = Summation(X * Y, Length)
>          - Length * Average(X, Length) * Average(Y, Length);
> LowEQ1 = Summation(Square(X), Length)
>          - Length * Square(Average(X, Length));
> LowEQ2 = Summation(Square(Y), Length)
>          - Length * Square(Average(Y, Length));
> 
> So let's say you're calculating a 20-bar RSquared.  RSquared
> calculates the Pearson's R (coeffR) of X and Y, where X =
> CurrentBar and Y = Close.  Say you're looking at S&P prices, and
> you have a 30-min chart that covers 60 days, most of one front
> contract.
> 
> At the end of that chart CurrentBar is around 700-800 or so.  S&P
> prices are also around 800-900 or so these days.  So each of
> those calculations above is roughly
>   20*7xx*9xx - 20*7xx*9xx
> 
> Each of those terms is in the ballpark of 14,000,000 -- which,
> you might notice, is an 8-digit number.  That means that the last
> one or two digits are absolute garbage, caused by the loss of
> precision from representing an 8-digit decimal number in a 32-bit
> binary format.  If the *correct* values of the two large values
> are the same in the first 7-8 digits (which often happens when
> calculating Pearson's R), then each of the resulting TS
> calculations is also absolute garbage.  If the correct values of
> the two large values are the same in the first 5-6 digits, the
> resulting TS calculation has only 2-3 valid digits.
> 
> For example, if the correct values of the UpEq calculation were
>   UpEq = 12345678.654321 - 12345678.123456
> then the *correct* value of UpEq is 0.530865.  But the value
> calculated by TS will be total gibberish, since 12345678.654321
> might get rounded (in the binary representation of the term
> before the subtraction happens) to 12345679.584029, and
> 12345678.123456 might get rounded to 12345677.896732.
> 
> If you take a longer RSquared, or if your chart has more bars on
> it, or if your price values are larger, the error gets worse.
> 
> The coeffR function was carelessly implemented by someone who
> obviously had no understanding of proper numerical methods.  They
> wrote it as if computer floating-point values had infinite
> precision; if they did, then that coeffR code would work fine.
> But in the real world of limited-precision variables, that code
> is WRONG.  It cannot possibly be accurate with 32-bit floats, and
> it wouldn't be that great with 64-bit floats.  It's just very bad
> code.
> 
> I only had one numerical-methods class 28 years ago, and I didn't
> pay much attention then. :-)  But even I know it's possible to
> calculate those values accurately, even with 32-bit floats, if
> you're a bit more careful.  E.g. instead of calculating
> Summation(X*Y,Length) [a large number] and subtracting Length *
> Average(X,Length)*Average(Y,Length) [another large number], you
> could do something like this:
> 
>   UpEq = 0;
>   avgX_avgY = Average(X, Length) * Average(Y, Length);
>   for i = 0 to length-1 begin
>     UpEq = UpEq + (X[i] * Y[i]) - avgX_avgY;
>   end;
> 
> That way you never get any large values, you never subtract two
> large values from each other, and so you lose very little
> precision.  You can do even better by running X from 1 to Length
> instead of from CurrentBar-Length+1 to CurrentBar, thus making
> the X*Y and avgX_avgY terms small no matter how many bars are on
> your chart.  The result is the same.  You can do this in plain
> old 32-bit TS code, without having to bother with a DLL.
> 
> But the folks who wrote the TS functions obviously had no
> understanding (or care) of little details like this.
> 
> Moral:  don't trust Tradestation numerical/statistical accuracy.
> 
> Gary

-- 
Regards,
Leslie Walko
610-688-2442
--
 "Life is a tragedy for those who feel, a comedy for those who
think"
	Horace Walpole, 4th earl of Orford, in a letter dated about 1770