"SPIRIT: Sequential Pattern Mining with Regular Expression Constraints"
by Minos N. Garofalakis, Rajeev Rastogi, and
Kyuseok Shim.
Proceedings of VLDB'99,
Edinburgh, Scotland, September 1999, pp. 223-234.
Abstract
Discovering sequential patterns is an important problem in data mining with a   
host of application domains including medicine, telecommunications,  and the    
World Wide Web.  Conventional  mining  systems provide users with only a very   
restricted mechanism (based on minimum support) for specifying  patterns  of    
interest.                                                                       
In this paper,  we propose the use of Regular Expressions (REs) as a flexible   
constraint  specification  tool that  enables  user-controlled  focus  to be    
incorporated into the pattern  mining  process.  We  develop  a family  of novel
algorithms (termed SPIRIT -- Sequential Pattern mIning with Regular expressIon  
consTraints)  for  mining  frequent  sequential  patterns  that  also  satisfy  
user-specified  RE  constraints.  The  main  distinguishing  factor  among the  
proposed schemes is the  degree  to which the RE constraints  are  enforced to  
prune the search space of patterns during computation.  Our  solutions provide  
valuable insights into the tradeoffs that  arise when constraints  that do not  
subscribe to nice properties (like  anti-monotonicity) are integrated into the  
mining process.  A  quantitative  exploration of these  tradeoffs is conducted  
through an extensive experimental study on synthetic and  real-life data sets.
Copyright © 1999, VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided
that the copies are not made or distributed for direct commercial advantage, the
VLDB copyright notice and the title of the publication and its date appear, and
notice is given that copying is by permission of the Very Large Data Base
Endowment.  To copy otherwise, or to republish, requires a fee and/or special
permission from the Endowment.