"SPIRIT: Sequential Pattern Mining with Regular Expression Constraints"
by Minos N. Garofalakis, Rajeev Rastogi, and
Kyuseok Shim.
Proceedings of VLDB'99,
Edinburgh, Scotland, September 1999, pp. 223-234.
Abstract
Discovering sequential patterns is an important problem in data mining with a
host of application domains including medicine, telecommunications, and the
World Wide Web. Conventional mining systems provide users with only a very
restricted mechanism (based on minimum support) for specifying patterns of
interest.
In this paper, we propose the use of Regular Expressions (REs) as a flexible
constraint specification tool that enables user-controlled focus to be
incorporated into the pattern mining process. We develop a family of novel
algorithms (termed SPIRIT -- Sequential Pattern mIning with Regular expressIon
consTraints) for mining frequent sequential patterns that also satisfy
user-specified RE constraints. The main distinguishing factor among the
proposed schemes is the degree to which the RE constraints are enforced to
prune the search space of patterns during computation. Our solutions provide
valuable insights into the tradeoffs that arise when constraints that do not
subscribe to nice properties (like anti-monotonicity) are integrated into the
mining process. A quantitative exploration of these tradeoffs is conducted
through an extensive experimental study on synthetic and real-life data sets.
Copyright © 1999, VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided
that the copies are not made or distributed for direct commercial advantage, the
VLDB copyright notice and the title of the publication and its date appear, and
notice is given that copying is by permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires a fee and/or special
permission from the Endowment.