[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [amibroker] Re: Data mining bias vs number of observations and UKB



PureBytes Links

Trading Reference Links

Brian,

Well said and thought out.  Thank you.  Your conclusion also matches  
my own experience that I prefer to have 300 to 500 and even up to 1000  
trades before I feel comfortable that the random variances in the  
results are under control.  Though I came up with that number based  
more on observation and an intuitive feel for it, I appreciate the  
rigorous approach you are taking.

As far as the UKB postings, I have thought about this also.  My  
approach will be to generate my material as a stand alone PDF  
document  and then provide a link to the PDF or other files that are  
uploaded.  The PDF will be formatted for printability.  Then some  
screenshots or text as a summary only will be put on the UKB for  
direct reading.  I did some of that with my first post.  I don't know  
how pervasive it has become, but my web browser (Safari) displays PDF  
files as easily as HTML files.  I am sure that this will increasingly  
become the standard capability over time.  If the vehicle of the UKB  
changes in the future, the PDF files will not be affected.  I also  
have a lot more freedom in the presentation layout and can work  
WYSIWYG with my Pages word processor locally (like I did with my  
Flexible Parameters document that was emailed to this list a short  
time back).

While I would have preferred to have something a bit easier to work  
with, like the easy way AFLs can be uploaded to the AFL Library (zero  
learning curve), I believe the organizational structure of the UKB is  
the best place we have right now to place pointers to information so  
that people can find it or refer to it.

Best regards,
Dennis Brown


On Apr 22, 2008, at 6:28 PM, brian_z111 wrote:
> Thanks for your question.
>
> It is a good, and necessary, thing to question new ideas.
>
> No, you haven't misunderstood the implications of what I am saying.
>
> First, to put it in context:
>
> I am not commenting on Walk-Forward since I am not comfortable with
> it and I don't have the experience anyway (possibly the reason I am
> not comfortable with it).
>
> I am referencing Fred and Howard to gain some insight into that area
> myself (to me training our systems 'on the fly' seems like a separate
> trading style to my own).
>
> I am looking in another direction.
>
> I am specifically 'researching' the grounds for deciding what metric
> to use when we get to the point of choosing our 'Objective Function',
> or as Fred calls it setting our 'Fitness, Goals and Constraints'; a
> decision that we have to make whenever we backtest, irrespective of
> the particular method we use (OOS, multiple OOS, Walk-Forward etc).
>
> My comments are based on observations that I have made, using Excel
> spreadsheets to simulate the null hypothesis i.e. that the markets
> are a random walk and therefore all trading systems will revert to 0
> mathematical expectancy over time.
>
> Luckily for me, those investigations uncovered a lot more than I
> originally bargained for - and yes it does have wider implications
> (if I am correct).
>
> Some could argue that 'synthetic' equity curves, based on
> RandomlyGeneratedNumbers (RGN's) are not real, with regards to
> trading.
>
> Well IMO they are a real simulation of the null hypothesis (give or
> take a bit of inaccuracy) and that we can learn a lot about
> evaluation from them simply because when can 'stress test' known
> evaluation tools (concepts, equations, metrics etc) against data with
> known W/L, Payoff and ProfitFactor ratios etc.
>
> My argument is that the above metrics (binomial factors) are the key
> inputs that drive equity outcomes and therefore how accurately we can
> predict their values, when using known 0 expectancy data, reflects
> how functional/pragmatic our evaluation techniques are.
>
> The observations I have made allow me to gain faith in some methods,
> lose faith in others and develop a few new ones of my own (in the
> very slippery world of evaluation, faith is a priceless commodity to
> me).
>
> Yes, it is difficult for me to take it all on board, let alone anyone
> else who hasn't had the benefit of working through all of the 'bench
> tests'.
>
> So, the implications are wider, but to answer your question I will
> focus on one aspect of my investigations i.e. sample error.
> I will also limit the discussion on sample error to the basics
> (sample error is rather pervasive and has one or two surprizing
> twists in the tail but I won't go into all of the nuances in this
> post).
>
> Keep in mind, that my intention is basically to 'share' my work by
> asking people to think about it.
>
> I am satisfied that a few are finding it interesting and stimulating.
>
> Applications are entirely up to the individual.
>
> Re sample error:
>
> I have added a graph to the K-Ratio_v2.xls file that is in the file
> section of this group.
>
> I have plotted the progressive W/L ratio for 1000 trades (W/L plots
> are one place where sample error is made blatantly obvious).
>
> F9 will force a recalc of the plot.
>
> (Some people might be uncomfortable with the fact that I have used
> the uniform distribution format of the underlying RGN's to produce
> the 'synthetic' data but I can assure them I have done my homework
> with various distributions and the answer is the same).
>
> Note that the W/L ratio, for the null hypothesis, is known to us in
> advance i.e. it is equal to 1/1 (this is with the default setting of
> Bias == 0.5, Volatility == 1 and the % factor as either 10 or 100 -
> DO NOT CHANGE THE %FACTOR TO 1)
>
> The first thing you will notice is that the beginning of the plot
> is 'wild' and deviates a long way from the known value for the first
> approx 100 datapoints (this is predicted by the sample error equation
> == 10% for N == 100).
>
> From observations I have made in other Excel benchtests I predict
> that the aritmetic mean of a number of trials (equity curves) will be
> very close to 1.0 and that the StDev of the final W/L ratio, in
> successive trials, will be 2*the sample error% == 2 * 3.2% (the test
> uses 1000 datapoints in total).
>
> So, as F9 is repeatedly pressed, new plots will be created.
> From N == 1 to around 300/400 the W/L ratio will be 'wild' then it
> will start to smooth out (statistical smoothing takes effect) and
> around 60% of the time the final W/L ratio will be within +- 1 StDev
> but around 1 in 100 times it will exceed 3 StDevs either way.
>
> This is an inescapable fact.
>
> Individually, we have to decide whether to ignore this or figure when
> and where to use it.
>
> If we look at the plot, and also consider sample error for all N
> datapoints, we can easily see we have to choose a value for N
> somewhere above 100 (too wild below that) and somewhere below 1400
> because the gain of lower %error is outweighed by the consumption of
> valueable data (I am assuming here that we are all data challenged).
>
> The choice we make is always a trade off between accuracy
> (statistical validity) and data consumption.
>
> Note that above approx 1400 we are only decreasing sample error by
> the 4th decimal place for every extra datapoint we use OR to put that
> another way, error% is around 2.5 at 1400N and 1.0 at 10,000N, so we
> haven't gained that much accuracy for the additional 8600N consumed.
>
> For utility purposes (pragmatic application) - if we are 100%
> objective traders then we accept Fred's and Howards opinion
> that "there is no substitute for OOS testing" so we need at least two
> samples that generate enough trades to pass our personal optimumN.
>
> If we are EOD traders and use indicators with long lookback periods
> coupled with relatively rare signals then we might need a very large
> number of bars to generate our minimum number of trades (*2 for IS
> and OOS samples).
>
> I don't know how others respond to the 'N facts of life' but it
> definitely influenced the way I trade, especially the frequency with
> which I trade.
>
> Yes, every situation is unique based on the number of data bars
> available/average time in trade/average time waiting for a new signal
> etc (tick traders, the kings of data affluence, have at least
> ticks/minute*60*6 more 'bars' to play with than EOD traders).
>
> IMO data is scarce for long term traders, and it is soon consumed.
>
> That is why we 'instinctively' tend to compromise by lowering our
> minimal N requirements.
>
> I think that answers your question.
>
> Naturally I went passed a lot of interesting side trails, in the
> interests of brevity (I can't write it and you cant read it all in
> one big bite).
>
> cheers,
>
> brian_z
>
> PS - to explain this stuff does require the use of Excel examples.
>
> The K-ratio file is on this site because the UKB was offline when I
> first posted it - it is not a political statement.
>
> If I do post more, on this and related subjects, I am now unlikely to
> use the UKB as the vehicle - that isn't a political statement either.
>
> I am just considering my own creative well being and like all artists
> I prefer having control of my canvas/workspace.
>
> I am likely to continue occassional posting to the
> Data/DatabaseManagement categories at the UKB and move my original
> stats work elsewhere (I haven't made a final decision yet).
>
>
> I am  --- In amibroker@xxxxxxxxxxxxxxx, Thomas Ludwig
> <Thomas.Ludwig@xxx> wrote:
>>
>> Brian,
>>
>> your post is very interesting (as always) - but I'm puzzled!
> Perhaps I
>> simply misunderstood.
>>
>> E.g., you wrote:
>>
>>> Here are some rules from my notebook:
>>>
>>> - good data, relevant to current conditions, is scarce. Why waste
> it?
>>> - sample error is real
>>> - around 300 to 400 trades is the minimum, with no further
>>> substantial minimization of sample error beyond, around 10,000
>>> - there is a sweet spot around 1,000 - 5,000 trades
>>> - if data is short then work with no less than 3-400
>>> - if data is in plentiful supply (intraday?) then use more
>>
>> Quite frankly, I'm not getting it. You say that the sweet spot is
> around
>> 1.000 - 5.000 trades (I assume for the IS period). So let's say for
>> simplicity, 1.000 trades minimum are desirable if you have enough
> data.
>> But what is enough data? As I haven't traded intraday so far I
> can't
>> answer this question for that style of trading. I'm trading daily
>> systems. Now let's assume that I have 10 years of daily data (would
> you
>> call that plentiful?). 1.000 trades mean 100 trades per year on
> average
>> or (if we assume 200 trading days by rule of thumb) one trade every
>> second day. Do your rules mean that an EOD system that doesn't
> produce
>> a trade at least every second day isn't testable/tradeable? And I'm
>> only talking about the IS period. What about OOS and walk-forward -
>> would I need, say, 20 years or data in your opinion to have enough
> data
>> for them?
>>
>> Again, I assume that I simply misunderstood. Perhaps you were
> talking
>> about a system that trades a large basket of stocks in order to
> achieve
>> this large number of trades?
>>
>> I'm really interested in your answer since your posts are always
> full of
>> hints worth to think about.
>>
>> Best regards,
>>
>> Thomas
>>
>
>
>
> ------------------------------------
>
> Please note that this group is for discussion between users only.
>
> To get support from AmiBroker please send an e-mail directly to
> SUPPORT {at} amibroker.com
>
> For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> http://www.amibroker.com/devlog/
>
> For other support material please check also:
> http://www.amibroker.com/support.html
> Yahoo! Groups Links
>
>
>


------------------------------------

Please note that this group is for discussion between users only.

To get support from AmiBroker please send an e-mail directly to 
SUPPORT {at} amibroker.com

For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
http://www.amibroker.com/devlog/

For other support material please check also:
http://www.amibroker.com/support.html
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/amibroker/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/amibroker/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:amibroker-digest@xxxxxxxxxxxxxxx 
    mailto:amibroker-fullfeatured@xxxxxxxxxxxxxxx

<*> To unsubscribe from this group, send an email to:
    amibroker-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/