"Efficient Algorithms for Constructing Decision Trees with Constraints"
by Minos Garofalakis,
Dongjoon Hyun,
Rajeev Rastogi, and
Kyuseok Shim.
Proceedings of ACM SIGKDD'2000,
Boston, Massachusetts, August 2000, pp. 335-339.
Abstract
Classification is an important problem in data mining.
A number of popular classifiers construct decision
trees to generate class models. Frequently, however, the constructed trees are
complex with hundreds of nodes and thus difficult to comprehend,
a fact that calls into question an often-cited benefit that decision
trees are easy to interpret.
In this paper, we address the problem of constructing "simple"
decision trees with few nodes that are easy for humans to interpret.
By permitting users to specify constraints on tree size or
accuracy, and then building
the "best" tree that satisfies the constraints, we ensure that the
final tree is both easy to understand and has good
accuracy. We develop novel branch-and-bound algorithms for pushing
the constraints into the building phase of classifiers, and pruning
early tree nodes that cannot possibly satisfy the constraints.
Our experimental results with real-life and synthetic data sets
demonstrate that significant performance speedups and reductions in the number
of nodes expanded can be achieved as a result of incorporating knowledge
of the constraints into the building step as opposed to applying
the constraints after the entire tree is built.
[
camera-ready paper
(pdf)
(ps.gz)
|
journal version
(in Data Mining and Knowledge Discovery)
]
Copyright © 2000, Association for Computing Machinery, Inc. (ACM).
Permission to make digital/hard copy of all or part of this material without
fee is granted provided that copies are not made or distributed for profit or
commercial advantage, the ACM copyright/server notice, the title of the
publication and its date appear, and notice is given that copying is by
permission of the Association for Computing Machinery, Inc. (ACM).
To copy otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.