CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1.
-
Upload
augustine-brendan-mclaughlin -
Category
Documents
-
view
218 -
download
0
description
Transcript of CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1.
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek
DecisionTreeFirstDraft
Paula MatuszekSpring, 2013
1
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 2
Decision Tree Induction• Very common machine learning and
data mining technique. • Given:
– Examples– Attributes– Goal (classification, typically)
• Pick “important” attribute: one which divides set cleanly.
• Recur with subsets not yet classified.
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 3
A Training Set
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 4
Expressiveness•Decision trees can express any function of the input
attributes.•E.g., for Boolean functions, truth table row → path to
leaf:
•Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples
•Prefer to find more compact decision trees
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 5
ID3• A greedy algorithm for decision tree construction
originally developed by Ross Quinlan, 1987 • Top-down construction of decision tree by
recursively selecting “best attribute” to use at the current node in tree– Once attribute is selected, generate children nodes,
one for each possible value of selected attribute– Partition examples using possible values of attribute,
assign subsets of examples to appropriate child node– Repeat for each child node until all examples
associated with a node are either all positive or all negative
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek
Best Attribute• What’s the best attribute to
choose? The one with the best information gain– If we choose Bar, we have
• no: 3 -, 3 + yes: 3 -, 3+– If we choose Hungry, we have
• no: 4-, 1 + yes: 1 -, 5+– Hungry has given us more information
about the correct classification.
6
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 7
Textbook restaurant domain• Develop a decision tree to model the decision
a patron makes when deciding whether or not to wait for a table at a restaurant
• Two classes: wait, leave• Ten attributes: Alternative available? Bar in
restaurant? Is it Friday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time?
• Training set of 12 examples• ~ 7000 possible cases
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek
Thinking About It• What might you expect a decision
tree to have as the first question? The second?
8
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 9
A Decision Tree from Introspection
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 10
Choosing an attribute•Idea: a good attribute splits the
examples into subsets that are (ideally) "all positive" or "all negative"
•Patrons? is a better choice
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 11
Learned Tree
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 12
How well does it work?• Many case studies have shown that
decision trees are at least as accurate as human experts. – A study for diagnosing breast cancer had
humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct
– British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system
– Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek
Pruning• With enough levels of a decision tree
we can always get the leaves to be 100% positive or negative
• But if we are down to one or two cases in each leaf we are probably overfitting
• Useful to prune leaves; stop when– we reach a certain level– we reach a small enough size leaf– our information gain is increasing too
slowly
13
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 14
Strengths of Decision Trees• Strengths include
– Fast to learn and to use– Simple to implement– Can look at the tree and see what is going on --
relatively “white box”– Empirically valid in many commercial products– Handles noisy data (with pruning)
• C4.5 and C5.0 are extension of ID3 that account for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation.
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 15
Weaknesses and Issues• Weaknesses include:
– Univariate splits/partitioning (one attribute at a time) limits types of possible trees
– Large decision trees may be hard to understand
– Requires fixed-length feature vectors – Non-incremental (i.e., batch method)– Overfitting
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek
Decision Tree Architecture• Knowledge Base: the decision tree
itself. • Performer: tree walker• Critic: actual outcome in training
case• Learner: ID3 or its variants
– This is an example of a large class of learners that need all of the examples at once in order to learn. Batch, not incremental.
16
CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 17
Summary: Decision tree learning• One of most widely used learning
methods in practice • Can out-perform human experts in
many problems