PureBytes Links
Trading Reference Links
|
> > Note: **CurrentBar** > 1000, not MaxBarsBack. **90 bar**
> > correlation. Both of which are quite within the bounds of reasonable
> > use. (Though I'm not sure why CurrentBar is relevant, since the TS
> > Correlation function doesn't accumulate anything.)
>
> Yes, I know.
> But if yoou look at the coreelation code posted, you will see this
> in the comments:
Neal was not speaking about Brickey's code. He was comparing a
corellation calculation on TS, Excel, and C++ -- presumably using
TS's bogus Correlation function, which DOES NOT accumulate errors.
So the CurrentBar was irrelevant.
But regarding Brickey's code: I realized after I posted that message
that I should have mentioned the cumulative error in Brickey's code.
But by then I'd already gone to bed. :-)
So, to address that: I *WOULD NOT* use Brickey's function,
especially in a single-precision environment like TS. Over many bars
it will generate cumulative errors that will significantly impact the
result.
In fact Brickey's code seems to have an error in it somewhere. Even
ignoring the cumulative error problems, it returns bogus values:
values far outside the proper -1 to +1 range, long stretches of
zeroes, etc. I didn't take the time to debug it, but I modified it
to use a slower but more correct "no cumulative error" approach.
This code (attached) matches the Excel PEARSON() function pretty
closely, at least most of the time. Occasionally it will go berserk
for a few bars and return values far off the Excel value, but that
seems to be caused by gaps in the data. TS and Excel handle them
differently. I believe that if you take care to avoid gaps in your
data, this function will match the Excel PEARSON() function as
closely as single-vs-double precision allows. Note that that is NOT
very close, maybe 2-3 decimal points at best, given the large values
that are calculated in the correlation function.
> > Better yet, test Brickey's EL code against a single-
> > precision C++ implementation of the same algorithm, so you know
> > you're comparing apples to apples. If THAT test shows radical
> > differences, then there is a legitimate problem.
>
> It will probably not match due to the accumuilation of errors
> using the sliding window summation after large CB values!
It had BETTER match, if you implement the **same algorithm** in C++.
The C++ implementation would have the same cumulative errors. If it
doesn't match, that indicates TS does indeed have a math problem.
Gary
{****************************************************************
PROGRAM : User Function: CorrelationP (Pearson's R Correlation)
LAST EDIT: 7/19/2001
PURPOSE : Returns the Coefficient of Correlation.
For a dependent variable YDEP
(usually Data1) it measures how
closely related YDEP is to XIND (usually data2)
the independent variable. If YDEP increases
directly as XIND increases, YDEP is positively
correlated with value 1.0. If YDEP
decreases directly as XIND increases then
YDEP is negatively correlated at value -1.0
If there's no linear relation between the
two, the correlation is 0.0.
This function was derived from a faster version
provided by Bob Brickey. That version had a bug
and accumulated an error due to the "sliding window"
approach it used.
METHOD : PRODUCT MOMENT CORRELATION FORMULA
From "Statistics", Murray R. Spiegel,
Schaum Publ. Co. NY, 1961, p245.
Also known as Pearson's correlation?
r = N SUMXY - SUMX * SUMY
---------------------
SquareRoot((N*SUMX2 - SUMX*SUMX) * (N*SUMY2 -SUMY*SUMY) )
}
Inputs:
YDEP(NumericSeries), {Data1 series (Y), dependent variable,}
XIND(NumericSeries), {Data2 series (X), independent variable
turns quicker than YDEP (Y), Data1}
XLEAD(Numeric), {Periods that Data2 (X) leads Data1(Y) at turns}
Len(Numeric); {# data periods to compare}
Vars: SUMX(0),SUMY(0),SUMX2(0),SUMY2(0),SUMXY(0),K(0),Temp(0),Temp2(0);
SUMX = 0;
SUMY = 0;
SUMX2= 0;
SUMY2= 0;
SUMXY= 0;
For K = 0 to Len - 1 begin
Temp = YDEP[K];
SUMY = SUMY + Temp;
SUMY2= SUMY2 + Temp * Temp;
Temp2= XIND[K + XLEAD];
SUMX = SUMX + Temp2;
SUMX2= SUMX2 + Temp2 * Temp2;
SUMXY= SUMXY + Temp2 * Temp;
End;
{Temp below will be a positive divisor later}
Temp = (Len * SUMX2 - SUMX * SUMX) * (Len * SUMY2 - SUMY * SUMY);
If Temp > 0 then {So don't divide by 0}
CorrelationP = (Len * SUMXY - SUMX*SUMY)/SquareRoot(Temp)
else
CorrelationP = 0;
|