Re: [amibroker] Data mining bias vs number of observations, AmiBroker Email List Archive

On Wed, Apr 23, 2008 at 7:52 AM, Louis Préfontaine <rockprog80@xxxxxxxxx> wrote:

Hi,

Are you sure about this? Having only 15 observations does not discard the luck factor. For been efficient, my guess would be that the sampling must be far more important, let's say AT LEAST 30 trades (many people suggested this in the past). Under 30 trades, the "best result" chosen by the walk-forward IS and then OOS test could be the result of luck. Well, that was my understanding of the data-mining bias as explained in Aronson's book.

Louis

2008/4/23, Howard B <howardbandy@xxxxxxxxx>:

Hi Louis --

The walk forward process solves those problems.

Thanks,
Howard

On Wed, Apr 23, 2008 at 6:47 AM, Louis Préfontaine <rockprog80@xxxxxxxxx> wrote:

Hi Howard,

The problem is this: the market is ever changing, as you say in your book. Let's say my system reacts a lot to what is doing the market as a whole, then I sure would need a shorter time-frame! 1 month would probably be too much, if we look at what happened in the first 2 weeks of January... You say it does not matter how many trades; but how to judge the value of an OOS result with only 10-15 trades? This could be luck!

I agree that it could be tested and re-tested, but testing until I get a correct time-frame seems to me like another original way of doing curve-fitting, don't you think?

That's the whole problm I see with walk-forward; it is good to know if the system is ready but it does not help that much to make the system better, because I only get the best result of each optimization and with limited number of trades the absolute best can always be best because of luck...

Louis

2008/4/23, Howard B <howardbandy@xxxxxxxxx>:

Hi Louis, and all --

Select the period of time for the in-sample period that works for the system you are using.
Select the period of time for the out-of-sample period and reoptimization period that is sufficient for the system and the market to stay in sync and to give you several walk forward steps.
Perform the walk forward analysis.
Look at the out-of-sample results from the combined walk forward steps.
Decide from there whether to trade or go back to the drawing board.

To make sure I have been clear on this ----
It does not matter At All how many trades or what length of time the in-sample period covers. Results from the in-sample runs have no value in estimating the future performance.

Thanks for listening,
Howard

On Tue, Apr 22, 2008 at 8:22 PM, Louis Préfontaine <rockprog80@xxxxxxxxx> wrote:

Hi Howard,

What would you consider to be a sufficiently large sample for IS and then for OOS? If I develop a system that makes 250 trades a year, then if I select IS-OOS of 2-3 weeks then it's no more than 10-15 trades. Is this enough?

Regards,

Louis

2008/4/22, Howard B <howardbandy@xxxxxxxxx>:

Hi Simon --

From your description, the system was developed on a set of data, but not tested on any data that was not used during development. The data used during development is called the in-sample data. Data used for testing that was not used during development is called the out-of-sample data.

The in-sample results always look good -- we do not stop playing with the system until they look good. The in-sample results have no value in estimating the future out-of-sample results. In order to estimate what the likely profitability will be when traded with real money, out-of-sample testing is necessary.

I have documented systems that have over 1,300,000 closed trades and reasonable looking results for the in-sample period, but were not profitable out-of-sample.

There is no substitute for out-of-sample testing.

Thanks for listening,
Howard
www.quantitativetradingsystems.com

On Thu, Apr 17, 2008 at 2:29 AM, si00si00 <si00si00@xxxxxxxxx> wrote:

Hi all,

I have a friend who has developed a trading system. It is an intraday
system that makes on average around 5 futures trades per day. We were
discussing it the other day and a point of disagreement arose between
us. He claims that there is no necessity for him to test the strategy
on out of sample data because he has back tested it using over 8 years
of historical intraday data, and the patterns the strategy predicts
occur 70% of the time or more.

My question is, does anyone know if the data-mining bias can be
considered irrelvant when the sample size is so large? (in this case,
the sample size is roughly 8400 trades). Put another way, with so many
observations, how many different rules would have to be back tested in
order for data-mining bias to creep in?

Thanks in advance for any thoughts you might have!

Simon