[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Precision Errors



PureBytes Links

Trading Reference Links

> > Note:  **CurrentBar** > 1000, not MaxBarsBack.  **90 bar**
> > correlation.  Both of which are quite within the bounds of reasonable
> > use.  (Though I'm not sure why CurrentBar is relevant, since the TS
> > Correlation function doesn't accumulate anything.)
> 
> Yes, I know.
> But if yoou look at the coreelation code posted, you will see this
> in the comments: 

Neal was not speaking about Brickey's code.  He was comparing a 
corellation calculation on TS, Excel, and C++ -- presumably using 
TS's bogus Correlation function, which DOES NOT accumulate errors.  
So the CurrentBar was irrelevant.

But regarding Brickey's code:  I realized after I posted that message 
that I should have mentioned the cumulative error in Brickey's code.  
But by then I'd already gone to bed.  :-)

So, to address that:  I *WOULD NOT* use Brickey's function, 
especially in a single-precision environment like TS.  Over many bars 
it will generate cumulative errors that will significantly impact the 
result.

In fact Brickey's code seems to have an error in it somewhere.  Even 
ignoring the cumulative error problems, it returns bogus values:   
values far outside the proper -1 to +1 range, long stretches of 
zeroes, etc.  I didn't take the time to debug it, but I modified it 
to use a slower but more correct "no cumulative error" approach. 

This code (attached) matches the Excel PEARSON() function pretty 
closely, at least most of the time.  Occasionally it will go berserk 
for a few bars and return values far off the Excel value, but that 
seems to be caused by gaps in the data.  TS and Excel handle them 
differently.  I believe that if you take care to avoid gaps in your 
data, this function will match the Excel PEARSON() function as 
closely as single-vs-double precision allows.  Note that that is NOT 
very close, maybe 2-3 decimal points at best, given the large values 
that are calculated in the correlation function.

> > Better yet, test Brickey's EL code against a single-
> > precision C++ implementation of the same algorithm, so you know
> > you're comparing apples to apples.  If THAT test shows radical
> > differences, then there is a legitimate problem.
> 
> It will probably not match due to the accumuilation of errors
> using the sliding window summation after large CB values! 

It had BETTER match, if you implement the **same algorithm** in C++.  
The C++ implementation would have the same cumulative errors.  If it 
doesn't match, that indicates TS does indeed have a math problem.

Gary


{****************************************************************

	PROGRAM : User Function: CorrelationP  (Pearson's R Correlation)

	LAST EDIT: 7/19/2001

	PURPOSE	: Returns the Coefficient of Correlation.

                  For a dependent variable YDEP
                  (usually Data1) it measures how
                  closely related YDEP is to XIND (usually data2)
                  the independent variable. If YDEP increases 
                  directly as XIND increases, YDEP is positively
                  correlated with value 1.0. If YDEP
                  decreases directly as XIND increases then
                  YDEP is negatively correlated at value -1.0
                  If there's no linear relation between the
                  two, the correlation is 0.0.

                  This function was derived from a faster version
                  provided by Bob Brickey.  That version had a bug
                  and accumulated an error due to the "sliding window"
                  approach it used.  

    METHOD      : PRODUCT MOMENT CORRELATION FORMULA
                  From "Statistics", Murray R. Spiegel, 
                  Schaum Publ. Co. NY, 1961, p245.
                  Also known as Pearson's correlation?
 
                  r =     N SUMXY - SUMX * SUMY
                          ---------------------
             SquareRoot((N*SUMX2 - SUMX*SUMX) * (N*SUMY2 -SUMY*SUMY) )
}

Inputs: 
     YDEP(NumericSeries), {Data1 series (Y), dependent variable,}
     XIND(NumericSeries), {Data2 series (X), independent variable
                           turns quicker than YDEP (Y), Data1}
     XLEAD(Numeric),      {Periods that Data2 (X) leads Data1(Y) at turns}
     Len(Numeric);        {# data periods to compare}
 
Vars: SUMX(0),SUMY(0),SUMX2(0),SUMY2(0),SUMXY(0),K(0),Temp(0),Temp2(0);
 
SUMX = 0;
SUMY = 0;
SUMX2= 0;
SUMY2= 0;
SUMXY= 0;

For K = 0 to Len - 1 begin
    Temp = YDEP[K];
    SUMY = SUMY  + Temp;
    SUMY2= SUMY2 + Temp  * Temp;

    Temp2= XIND[K + XLEAD];
    SUMX = SUMX  + Temp2;
    SUMX2= SUMX2 + Temp2 * Temp2;

    SUMXY= SUMXY + Temp2 * Temp;
End;
 
{Temp below will be a positive divisor later}
Temp = (Len * SUMX2 - SUMX * SUMX) * (Len * SUMY2 - SUMY * SUMY);

If Temp > 0 then  {So don't divide by 0}
   CorrelationP = (Len * SUMXY - SUMX*SUMY)/SquareRoot(Temp)
else
   CorrelationP = 0;