[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [amibroker] Re: Data mining bias vs number of observations



PureBytes Links

Trading Reference Links

Dear Dennis, brian_z and Randy,

Thanks very much for your answers. They were very helpful.

Basically what I have understood (and half-expected) is that:
- Data mining bias is impossible to eliminate but the degree can be controlled.
- There is nothing to lose by testing the strategy on OOS data and it can only add to understanding the system.
- It's very important to analyse what rules in your startegy are contributing and when they are acting. Maybe a few rules are getting lucky during short periods and skewing the whole performance of the system
Possibly I can ride my motorbike, at 200mph, going the wrong way up a 
6 lane highway but what is the point if I just want to get from A to 
B - am I going somewhere or thrill seeking?
I laughed a lot at this metaphor! The friend in question is a very complicated character and is very set in his ways. I just hope for his sake that he decides to try testing on OOS data even if in the end it just confirms his strategy.
Data mining, per se, is not the only thing on the list of 'rocks that 
traders dash their ships on' - there's more on the same list (most of 
them receive a lot less publicity).
I suppose bad Money Management is probably one of the other large "rocks that traders dash their ships on"

Great help from you all.
Thanks again,

Simon.

brian_z111 wrote:
Hello Simon,

Great question.

I have an interest in Single Sample Testing (SST) and pushing the 
boundaries there. It is a big NO, NO to the 'defenders of the faith'.

I also have a strong bias to simple systems. No, or few, indicators 
with lookback periods etc (I don't use many rules/lose degrees of 
freedom) hence my interest in the subject.

My gut feeling tells me I can do it but I haven't got far with the 
proof (however that doesn't mean much since there are terabytes of 
books and academic research, out there, that I am totally unaware of).

Personally, I think SST only has academic interest.
I am following it because I am curious, I learn from the enquiry and 
I love to confound my critics.

So, possibly your friend is correct but if s/he is absolutely certain 
about it s/he would be capable of writing a book on evaluation - in 
fact if that is the case, I wish s/he would, thereby saving me a lot 
of time and trouble.

Anyway, over to the here and now.

  
My question is, does anyone know if the data-mining bias can be
considered irrelvant when the sample size is so large? (in this 
case,
the sample size is roughly 8400 trades). 
    

Possibly I can ride my motorbike, at 200mph, going the wrong way up a 
6 lane highway but what is the point if I just want to get from A to 
B - am I going somewhere or thrill seeking?

Here are some rules from my notebook:

- good data, relevant to current conditions, is scarce. Why waste it?
- sample error is real
- around 300 to 400 trades is the minimum, with no further 
substantial minimization of sample error beyond, around 10,000
- there is a sweet spot around 1,000 - 5,000 trades
- if data is short then work with no less than 3-400
- if data is in plentiful supply (intraday?) then use more
- one sample might be good enough (in exceptional circumstances/for 
exceptional traders) but why not reduce risk and use more (if you 
have the data)
- 1 IS and 1 OOS is better than 1 IS alone
- even though I am interested in SST, and more likely than most to 
succeed with it, I am actually using several OOS, of optimum length, 
whenever I can.

No, 8400 trades, in a single IS test, does not guarantee success (it 
is very easy to find rare cases, on a computer, because we can work 
our way through such large datasets in a relatively short space of 
time - 1 in a million chance in real life === 1 in a backtest chance 
on a computer).

We can't rely on stats alone - they never give a definitive answer.

Different story if your friend has observed a persistent, and 
predictable, market inefficiency and the stats are just confirming 
and quantifying that.

  
Put another way, with so many
observations, how many different rules would have to be back tested 
in
order for data-mining bias to creep in?
    

I am still mulling over this point.

What is the least number of rules that a useful system could be 
described in? Perhaps three rules would be the least that anyone is 
successfully using (I don't know - I am wondering how many is the 
least possible).

Say I have a system with only three rules - if I test it IS and 
change 1 rule a little bit I am still tuning the system to that data, 
aren't I?

If I have a system with only three rules, test in IS, and it is 
successful, then test it OS and it is successful, all I am doing is 
confirming that the system is tuned to those two particular datasets, 
aren't I .

Based on those observations I would say that, since we can't avoid 
data mining, even with simplistic methods, then we are always 'data 
mining' when we use historical data.

The only time we are not datamining is when we are live trading.

OOS testing is the historical surrogate for live trading, in that at 
least the data is unknown, to the system, prior to walkforward or OOS.

The only thing about datamining that varies, when we are using 
historical data, is the degree.

The more rules + the greater the range of adjustble parameters within 
the rules == the more likely we are to be 'fooled by randomness'.

In short - no matter what we do we can never achieve 100% certainty 
but OOS and live paper trading will minimize the risk compared to SST 
alone.

Some food for thought:

Data mining, per se, is not the only thing on the list of 'rocks that 
traders dash their ships on' - there's more on the same list (most of 
them receive a lot less publicity).

brian_z

brian_z


--- In amibroker@xxxxxxxxxxxxxxx, "si00si00" <si00si00@xxx> wrote:
  
Hi all,

I have a friend who has developed a trading system. It is an 
    
intraday
  
system that makes on average around 5 futures trades per day. We 
    
were
  
discussing it the other day and a point of disagreement arose 
    
between
  
us. He claims that there is no necessity for him to test the 
    
strategy
  
on out of sample data because he has back tested it using over 8 
    
years
  
of historical intraday data, and the patterns the strategy predicts
occur 70% of the time or more.

My question is, does anyone know if the data-mining bias can be
considered irrelvant when the sample size is so large? (in this 
    
case,
  
the sample size is roughly 8400 trades). Put another way, with so 
    
many
  
observations, how many different rules would have to be back tested 
    
in
  
order for data-mining bias to creep in?

Thanks in advance for any thoughts you might have!

Simon

    



  
__._,_.___

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html




Your email settings: Individual Email|Traditional
Change settings via the Web (Yahoo! ID required)
Change settings via email: Switch delivery to Daily Digest | Switch to Fully Featured
Visit Your Group | Yahoo! Groups Terms of Use | Unsubscribe

__,_._,___