[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[amibroker] Re: Is the Walk forward study useful?



PureBytes Links

Trading Reference Links

Howard,

"Does anyone have truly out-of-sample results where the t-test statistic is embarrassingly large?"

Embarrassing? No. But, maybe a little self conscious ;) I have run WFA optimizing on SQN.

After fifteen IS iterations (producing fourteen OOS values):

- Only one OOS value is below 2 (it came it at 1.77).
- Two others are between 2 and 3.
- The remaining 11 range from 3.09 - 5.69

The above are calculated using Tharp's formula and are based on OOS samples ranging in size from as few as 70 trades (SQN = 3.75) to as many as 466 trades (SQN = 5.04) over 6 month periods across all US stocks. The SQN = 5.69 was achieved with 189 OOS trades.

For reference, the IS calculations span a 2 year period and ranged from SQN = 3.2 to SQN = 6.17, each at a little over 1000 trades.

However, it is worth noting that when I ran my WFA, I separated strategy from position size. During the WFA process, I hard coded position size to $1,000 based on a $500,000 initial equity and allowed for a (never attained) maximum of 500 concurrent positions such that all trades were taken.

My intent was to optimize based on the consistency of the trade results. Once the WFA was complete, I updated the position sizing per the optimal percent risk for the resultant SQN, limited by the suitable portfolio heat for the same. I then ran by hand the backtest for each OOS to see what the actual equity and other summary statistics would be using a more realistic position sizing.

The nature of my strategy is such that when the market takes a dive, I enter many concurrent positions, sometimes reaching the portfolio heat limit. As such, the realistic position sizing could cause some trades found during the non sized OOS WFA to not be taken, thereby affecting the realistic SQN.

Missing out on some trades is consistent with Tharp's examples. But, I personally do not agree with that approach. In my view it can materially change the trade distribution. Sometimes significantly.

To counter this, I have calculated my R values based on the natural logarithm of the exit price relative to the entry price as per Ralph Vince in "Handbook of Portfolio Mathematics". This, as opposed to absolute dollar gains.

In so doing, I am able to add dynamic position sizing such that size is reduced below optimal (when necessary) in order to take *every* trade. This results in an identical sized SQN compared to the non sized WFA OOS results, albeit at non optimal leverage.

As a result of the non optimal leverage, the absolute returns are not as immense as what the high SQN would suggest. However, in test after test, it has been consistently shown that peak performance of my strategy is highly dependent upon taking all trades.

Mike

--- In amibroker@xxxxxxxxxxxxxxx, Howard B <howardbandy@xxx> wrote:
>
> Hi Bing, and all --
> 
> It really is Not a personal choice to decide whether to limit the number of
> data points counted, if you want the statistic to be a t-test.  Mr Gossett
> (the actual name of the person who developed and published the t-test under
> the name "student") went to great lengths in detailing the behavior of
> empirical data and its agreement to statistical distributions.  The t-test
> Always uses square root of number of observations.
> 
> Of course, you are always permitted to develop your own test statistic.  But
> unless you also do the research to determine critical values for various
> situations, you have no way of knowing the probability of the result that
> was observed.
> 
> If you limit the number of observations that are considered, Do Not use the
> t-test tables to determine significance.  At least, do not use them
> expecting that they accurately reflect your new test statistic -- I have no
> idea what modification would need to be made, and the statistical world
> would not publish my paper if I did the research and tried to get it
> accepted.
> 
> I sincerely wish Van Tharp had not published that suggestion.  It is just
> plain bad science.  Do not use it.
> 
> ....................................
> 
> But back to the reality check as the t-test is being applied to metrics for
> trading systems.
> 
> Does anyone have truly out-of-sample results where the t-test statistic is
> embarrassingly large?
> 
> If so, I want your kind of problem.  Contact me and we will deal with the
> t-test issue from the comfort of our yacht in the Bahamas.  If not, what is
> the issue?
> 
> Thanks,
> Howard
> 
> 
> 
> On Sat, Oct 17, 2009 at 11:52 PM, bingk66 <bing.kwok@xxx> wrote:
> 
> >
> >
> > Hi Mike,
> >
> > I understand what you are saying. I agree that the more data points in the
> > calculations the more confidence you are entitle to have in the results of
> > the system, and hence the usage of the sqrt(N) portion of the equation to
> > reflect that. The concept of it all is fine.
> >
> > However, where I am not entirely comfortable with the equation is the
> > sqrt(N) part, as sqrt(N) can have way too much weighting in the overall
> > t-test score once N gets too large. Perhaps using the cube root of N might
> > be better as a means of allowing N to be fully factored into the equation (
> > as opposed to Van Tharp's proposal of limiting N to 100) without overbearing
> > the other parts of the equation. It boils down to a personal choice, I
> > guess, not unlike the calculation of the objective function that Howard
> > describes in his books whereby you wish to calculate a single number, and
> > that single number could be derived from a number of sources, each with
> > their own weighting and you like some form of partitioning across these
> > different sources so that no one source overbears the others, inorder to get
> > a balanced calculation can be obtained.
> >
> > Finally, the last part of your post would seem to indicate that as N
> > increases, expectancy/StdDevofR would tend to decrease. I can't see or
> > understand why that would be the case. I would appreciate it if you could
> > provide a brief explanation as to why that might be the case or at least
> > provide a link whereby I can read up a little more on this.
> >
> > Bing
> >
> >
> > --- In amibroker@xxxxxxxxxxxxxxx <amibroker%40yahoogroups.com>, "Mike"
> > <sfclimbers@> wrote:
> > >
> > > Bing,
> > >
> > > In this example, the t-test is calculated to give us a level of
> > confidence that the average of the sample is different than zero.
> > >
> > > If a trade strategy had no predictive power, then its results would be
> > purely random, producing a net gain (over the long run) of zero with an
> > equal number of winners and losers.
> > >
> > > Actually, it would be a net gain of zero *over the prevailing trend*,
> > where the trend itself might be greater than zero, as per Aronson. But, that
> > is another conversation.
> > >
> > > The more trades taken, the more likely the true average would show. For
> > example; Flip a coin 4 times. You might get 3 heads, 1 tail for an average
> > of 0.75 heads. Flip a coin 1000 times and the average number of heads will
> > be much much closer to 0.5.
> > >
> > > Going back to your trade example, if we are getting a non zero average
> > after thousands of trades, then we are more and more confident that in fact
> > the average is not zero. Thus, the larger t-test score is justified, and is
> > in fact built into the equation.
> > >
> > > In other words, you don't have to worry about getting a SQN score of 7
> > after 5000 trades, because you will likely never find a trade strategy that
> > is capable of producing an expectancy of 0.1 after that many trades!
> > >
> > > Mike
> > >
> > > --- In amibroker@xxxxxxxxxxxxxxx <amibroker%40yahoogroups.com>,
> > "bingk66" <bing.kwok@> wrote:
> > > >
> > > > Hi Howard,
> > > >
> > > > If there are no means to limit the number of transactions in the calcs,
> > then one seriously runs the risk of challenging the mystical t-test score of
> > 7 that you spoke about previously.
> > > >
> > > > As an example, if the OOS test was run over a 5 year period with 5000
> > transactions (a mere 1000 transaction/year, which is not excessive,
> > especially for very short term trades), sqrt(5000) alone would yield in
> > excess of 70 for the multiplier. This would leave expectancy/StdDev of R
> > with just a target of 0.1, to reach the 7 t-tests score.
> > > >
> > > > Now, if you had 1,000,000 tranasctions in your OOS test....
> > > >
> > > > The concept of limiting the trade count does make sense to me. Maybe
> > 100 is too low, and should be set higher. There does come a point whereby
> > the sqrt(N) part of the equation will render the rest of the equation
> > irrelevant once N gets too large.
> > > >
> > > > $0.02
> > > >
> > > > Bing
> > > >
> > > >
> > > >
> > > > --- In amibroker@xxxxxxxxxxxxxxx <amibroker%40yahoogroups.com>, Howard
> > B <howardbandy@> wrote:
> > > > >
> > > > > Hi Zozu --
> > > > >
> > > > > I must disagree with Van Tharp on this.
> > > > >
> > > > > If the runs are truly out-of-sample, then each and every one
> > contributes to
> > > > > the computation. It makes no sense to limit the count to 100. It is
> > poor
> > > > > procedure to limit the count. It is bad science to limit the count.
> > Do not
> > > > > limit the count.
> > > > >
> > > > > If the runs are in-sample, then the test has no meaning anyway.
> > Computing
> > > > > the t-test statistic using any N will be misleading. Do not even do
> > the
> > > > > computation. If a decision to trade a system is made after computing
> > the
> > > > > t-test statistic on trades that came solely from in-sample results,
> > there is
> > > > > an extremely high probability that a Type I error will be committed.
> > That
> > > > > is, the trader will believe that his system is better than random,
> > when it
> > > > > is in fact not better than random. Type I errors result in loss of
> > money.
> > > > >
> > > > > Thanks,
> > > > > Howard
> > > > >
> > > > >
> > > > > On Tue, Oct 13, 2009 at 10:54 AM, zozuzoza <zozuka@> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi Howard,
> > > > > >
> > > > > > Limiting the number of N doesn't mean that you are not using all
> > trades for
> > > > > > the calculation of SQN. Only the sqrt(N) part of the formula is
> > limited in
> > > > > > order not to distort the results if there are many trades. It makes
> > sense.
> > > > > > The other part of the formula does count on all the trades.
> > > > > >
> > > > > > Zozu
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >  
> >
>




------------------------------------

**** IMPORTANT PLEASE READ ****
This group is for the discussion between users only.
This is *NOT* technical support channel.

TO GET TECHNICAL SUPPORT send an e-mail directly to 
SUPPORT {at} amibroker.com

TO SUBMIT SUGGESTIONS please use FEEDBACK CENTER at
http://www.amibroker.com/feedback/
(submissions sent via other channels won't be considered)

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/