[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Decision trees



PureBytes Links

Trading Reference Links

Jim Michael said
> I may have overshot the mark by using the phrase 'complex math' when I 
> meant to imply that you would be unlikely to reach a limitation using them.
> 
> Anyone doing anything with decision trees?

In the past, yes.

We had some success, but have since moved on to other things. 
None the less, maybe I can give something back to the list.

Random thoughts:

As with any machine learning approach, the quality of the 
input attributes are critical.

Decision trees learning systems induce readable classifiers such as:

if attribute A > 1.4:
	if attribute B = "likely trend":	Class = "buy"
	if attribute B = "no trend": 		Class = "do nothing"
if attribute A < 1.4:
	if attribute C > 1023:
		if attribute D = "oversold":	Class = "exit"
[and so on]

They were invented as a response to the difficulties with building
expert systems manually by interviewing experts. Getting
experts to state their knowledge in a form that was both 
comprehensible and correct turns out to be very difficult. 
Expert knowledge, as we well know in trading, is often 
sub-cognitive, so the expert will be able to say that XX is 
likely to occur, but not provide a 'full' explanation as to
why. This is known as the knowledge acquisition bottleneck.
The alternative approach was to learn a set of 
rules from expert classified examples.

Decision trees are built by recursively dividing
the training set on that attribute that provides the greatest
improvment in [one of a range of criteria, including information
gain, purity, and significance (chi2)].

Decision tree learners can be charatorised as serial classifiers,
that is, they divide the classifier on the basis of a sequence of
attribute values. This can be contrasted to methods like instance 
based learning, and neural networks that combine attributes in a 
more "parallel" fashion with decision boundaries based on combinations
of attributes.

Decision tree, and rule learning systems, provide a readable description
of the resulting system, which is very useful if you'd like to audit 
the beastie before deployment.

The readability of decision trees is not guaranteed, as they can get
very large. My experience, however, is that large decision trees make
poor traders. A basic heuristic, called occams razor, suggests that
trees should be kept as simple as possible.

Decision trees are very fast to train, are reasonably accurate,
and provide a readable decision function.
If you're using them:
1.	Use a small set of 'good' attributes.
2.	As with any learning approach, you must avoid overfitting.
	Use an out of sample test set. CRITICAL
	Even better, if you're repeatedly training and testing over the
	same data, use a secondary test set that is only applied 
	when you think the rules are good. 
	Why? It's very easy to fit the data, rather than find useful
	rules. This applies to pretty much 'any' machine learning 
	system - although there are some approaches I'm aiming to
	pursue [post thesis] that might alleviate some of these
	problems {such as minimum message length, etc}.
3.	Make sure that any tree you induce is small, (i.e. heavy pruning
	is called for.) One of the reasons for this is that financial
	data is often serially correlated, so 'most' machine learning
	systems will overstate the importance of some decisions.
4.	A heuristic we found useful was to isolate individual portions
	of the system into different learning tasks, such as buy, sell,
	exitlong, etc.	
5.	Another approach is to use machine learning to filter signals
	from a system that is almost good enough.

I'm just completing a PhD thesis in machine learning which started 
with applications in trading, but has drifted somewhat since then.
I'd be happy to answer questions on the topic.

Cheers

Michael Harries

p.s. 	Sorry about the rambling nature of this mail :)
p.p.s.	Note, there is a new technology for learning systems, called
	boosting, that trades off readability and some speed, in return
	for improved accuracy - in some cases, but that's another story.

-- 
_____________________________________________________________________
Michael Harries                    | TradeStation Add-ins
Quality Trading Innovations        | DLLs: HashNums QlsNums Instance
http://www.ozemail.com.au/~qtrade  | Batch Execution: TSTrades
http://www.cse.unsw.edu.au/~mbh