[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Data mining bias vs number of observations

To: amibroker@xxxxxxxxxxxxxxx
Subject: [amibroker] Re: Data mining bias vs number of observations
From: "brian_z111" <brian_z111@xxxxxxxxx>
Date: Tue, 22 Apr 2008 03:10:27 -0000

PureBytes Links

Trading Reference Links

Now that you have got me thinking about the subject I have decided to 
pencil in some new rules to my 'little green book':

- datamining is a fancy name for 'tuning our system to a dataset'
- anytime we change one system rule, by any amount, based on data 
feedback, we are tuning, even if that dataset is produced in live 
trading
- the best test, of a robust system, is when we submit it to a 
dataset, that is unknown to the system, without any changes to the 
rules, and the variance in the outcomes is low, when compared to the 
previous test (providing the test samples are > 3-400 trades at the 
least).

Hope that clarifies it for you.

brian_z


--- In amibroker@xxxxxxxxxxxxxxx, "brian_z111" <brian_z111@xxx> wrote:
>
> Hello Simon,
> 
> Great question.
> 
> I have an interest in Single Sample Testing (SST) and pushing the 
> boundaries there. It is a big NO, NO to the 'defenders of the 
faith'.
> 
> I also have a strong bias to simple systems. No, or few, indicators 
> with lookback periods etc (I don't use many rules/lose degrees of 
> freedom) hence my interest in the subject.
> 
> My gut feeling tells me I can do it but I haven't got far with the 
> proof (however that doesn't mean much since there are terabytes of 
> books and academic research, out there, that I am totally unaware 
of).
> 
> Personally, I think SST only has academic interest.
> I am following it because I am curious, I learn from the enquiry 
and 
> I love to confound my critics.
> 
> So, possibly your friend is correct but if s/he is absolutely 
certain 
> about it s/he would be capable of writing a book on evaluation - in 
> fact if that is the case, I wish s/he would, thereby saving me a 
lot 
> of time and trouble.
> 
> Anyway, over to the here and now.
> 
> > My question is, does anyone know if the data-mining bias can be
> > considered irrelvant when the sample size is so large? (in this 
> >case,
> > the sample size is roughly 8400 trades). 
> 
> Possibly I can ride my motorbike, at 200mph, going the wrong way up 
a 
> 6 lane highway but what is the point if I just want to get from A 
to 
> B - am I going somewhere or thrill seeking?
> 
> Here are some rules from my notebook:
> 
> - good data, relevant to current conditions, is scarce. Why waste 
it?
> - sample error is real
> - around 300 to 400 trades is the minimum, with no further 
> substantial minimization of sample error beyond, around 10,000
> - there is a sweet spot around 1,000 - 5,000 trades
> - if data is short then work with no less than 3-400
> - if data is in plentiful supply (intraday?) then use more
> - one sample might be good enough (in exceptional circumstances/for 
> exceptional traders) but why not reduce risk and use more (if you 
> have the data)
> - 1 IS and 1 OOS is better than 1 IS alone
> - even though I am interested in SST, and more likely than most to 
> succeed with it, I am actually using several OOS, of optimum 
length, 
> whenever I can.
> 
> No, 8400 trades, in a single IS test, does not guarantee success 
(it 
> is very easy to find rare cases, on a computer, because we can work 
> our way through such large datasets in a relatively short space of 
> time - 1 in a million chance in real life === 1 in a backtest 
chance 
> on a computer).
> 
> We can't rely on stats alone - they never give a definitive answer.
> 
> Different story if your friend has observed a persistent, and 
> predictable, market inefficiency and the stats are just confirming 
> and quantifying that.
> 
> >Put another way, with so many
> > observations, how many different rules would have to be back 
tested 
> >in
> > order for data-mining bias to creep in?
> 
> I am still mulling over this point.
> 
> What is the least number of rules that a useful system could be 
> described in? Perhaps three rules would be the least that anyone is 
> successfully using (I don't know - I am wondering how many is the 
> least possible).
> 
> Say I have a system with only three rules - if I test it IS and 
> change 1 rule a little bit I am still tuning the system to that 
data, 
> aren't I?
> 
> If I have a system with only three rules, test in IS, and it is 
> successful, then test it OS and it is successful, all I am doing is 
> confirming that the system is tuned to those two particular 
datasets, 
> aren't I .
> 
> Based on those observations I would say that, since we can't avoid 
> data mining, even with simplistic methods, then we are always 'data 
> mining' when we use historical data.
> 
> The only time we are not datamining is when we are live trading.
> 
> OOS testing is the historical surrogate for live trading, in that 
at 
> least the data is unknown, to the system, prior to walkforward or 
OOS.
> 
> The only thing about datamining that varies, when we are using 
> historical data, is the degree.
> 
> The more rules + the greater the range of adjustble parameters 
within 
> the rules == the more likely we are to be 'fooled by randomness'.
> 
> In short - no matter what we do we can never achieve 100% certainty 
> but OOS and live paper trading will minimize the risk compared to 
SST 
> alone.
> 
> Some food for thought:
> 
> Data mining, per se, is not the only thing on the list of 'rocks 
that 
> traders dash their ships on' - there's more on the same list (most 
of 
> them receive a lot less publicity).
> 
> brian_z
> 
> brian_z
> 
> 
> --- In amibroker@xxxxxxxxxxxxxxx, "si00si00" <si00si00@> wrote:
> >
> > Hi all,
> > 
> > I have a friend who has developed a trading system. It is an 
> intraday
> > system that makes on average around 5 futures trades per day. We 
> were
> > discussing it the other day and a point of disagreement arose 
> between
> > us. He claims that there is no necessity for him to test the 
> strategy
> > on out of sample data because he has back tested it using over 8 
> years
> > of historical intraday data, and the patterns the strategy 
predicts
> > occur 70% of the time or more.
> > 
> > My question is, does anyone know if the data-mining bias can be
> > considered irrelvant when the sample size is so large? (in this 
> case,
> > the sample size is roughly 8400 trades). Put another way, with so 
> many
> > observations, how many different rules would have to be back 
tested 
> in
> > order for data-mining bias to creep in?
> > 
> > Thanks in advance for any thoughts you might have!
> > 
> > Simon
> >
>



------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/

Follow-Ups:
- [amibroker] Re: Data mining bias vs number of observations
  - From: avlovestrading

References:
- [amibroker] Re: Data mining bias vs number of observations
  - From: brian_z111

Prev by Date: Re: [amibroker] re: MSN data
Next by Date: [amibroker] Re: Data mining bias vs number of observations
Previous by thread: [amibroker] Re: Data mining bias vs number of observations
Next by thread: [amibroker] Re: Data mining bias vs number of observations
Index(es):
- Date
- Thread