[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data Statistical Significance



PureBytes Links

Trading Reference Links

At 07:57 PM 2/25/98, you wrote:

>A lot of vendors only use 5 years EOD and 1 year daytrade data for system
>backtesting, hardly adequate for any statistical validation.  I agree with
>you and Pierre when you both demand large data sets for system backtesting.
> But I disagree with your premise that real time performance results lack
>statistical relevance.  
>
>One of my points in an earlier post is that I require a minimum of 3 years
>real time account performance for EOD systems and 1 year for daytrade
>systems.  Why did I pick these numbers?  Because a lot of system vendors
>use only 5 and 1 years, respectively, for backtesting EOD and daytrade
>systems. 

Tony:

I agree with you 100%.  Let's not debate further what you have written
above.  But, can we go on to another point: How can we be confident that 
as a backtest extends further back in time that the characteristics of 
the data are representative of the environment in which the system will
be trading in real-time?

Believe me, I very much want to be assured that a system that performed
well in 1988 will perform well in 1998 - however, does 1988 market data
really bear any relationship to what I am likely to experience this year?
I'm not saying it doesn't - I honestly don't know - but it seems plausible
that the markets are evolving, perhaps at a rate that would invalidate
backtest results from more than X years ago.

Again, I want to emphasize that I am not debating the desirability of
consistent system performance over a significant backtest period.  The
legal disclaimers all say that "past results do not indicate future
performance", but we want to believe that past results are the best
indicator that we have.  Two ideas:

 - Might future results be better predicted by backtesting a system
on recent data from multiple, similar markets, rather than a long
history from a single market?  Ie, look at system results for the last 
two years of T-Bond, T-Bill, and T-Note data, rather than just the last  
six years of just T-Bond data alone.  Even, if your plan is to only
trade the T-Bond market.

 - I believe that some guru (Tsuhar Chande?) had proposed a way of
characterizing market data by statistical parameters - ie. standard
deviation of daily returns, average range, etc. - and then creating
"synthetic data" that exhibited similar characteristics.  By taking
the actual characteristics of recent data, and using them to create
a significant set of synthetic data, extensive backtesting could be
done against an environment that might better match what would be
encountered in the immediate future.  Anyone accept this approach?

Thanks

Jay Mackro