"Approximate query processing using wavelets"
Abstract
Approximate query processing has emerged as a cost-effective approach for
dealing with the huge data volumes and stringent response-time requirements of today's
Decision Support Systems (DSS). Most work in this area, however, has so far been
limited in its query processing scope, typically focusing on specific forms of aggregate
queries. Furthermore, conventional approaches based on sampling or histograms appear to be
inherently limited when it comes to approximating the results of complex queries over
high-dimensional DSS data sets.
In this paper, we propose the use of multi-dimensional wavelets as an effective tool for
general-purpose approximate query processing in modern, high-dimensional applications.
Our approach is based on building wavelet-coefficient synopses of the data and
using these synopses to provide approximate answers to queries.
We develop novel query processing algorithms that operate directly on the wavelet-coefficient
synopses of relational tables, allowing us to process arbitrarily complex queries entirely
in the wavelet-coefficient domain.
This guarantees extremely fast response times since our approximate query execution
engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially
postponing the expansion into relational tuples until the end-result of the query.
We also propose a novel wavelet decomposition algorithm that can build these synopses in an
I/O-efficient manner.
Finally, we conduct an extensive experimental study with synthetic as well as real-life data
sets to determine the effectiveness of our wavelet-based approach compared to sampling and
histograms.
Our results demonstrate that our techniques
(1) provide approximate answers of better quality than either sampling or histograms,
(2) offer query execution-time speedups of more than two orders of magnitude, and
(3) guarantee extremely fast synopsis construction times that scale linearly with the
size of the data.
Copyright © 2001, Springer-Verlag.
The original publication is available on LINK at
http://link.springer.de. Please use the
appropriate URL and/or DOI for the article in LINK. Articles disseminated via
LINK are indexed, abstracted and referenced by many abstracting and information
services, bibliographic networks, subscription agencies, library networks, and
consortia.