Linear Sketches for Approximate Aggregate Range Queries 1 , 2

msra(2004)

引用 24|浏览4
暂无评分
摘要
Answering aggregate queries approximately over multidimensional data is an important problem that arises naturally in many applications. An approach to the problem is to maintain a succinct (i.e. O(k) space) representation, called sketch, of the frequency distribution h of the data, and use ĥ for answering queries. Common sketches are constructed via linear mappings of h onto a k–dimensional space, e.g. map h to its top–k Fourier/Wavelet coefficients. We call such sketches linear sketches, since ĥ = P ∗h for some sketching matrix P . Linear sketches have the benefit that they can be easily maintained incrementally over data streams. Sketches are typically optimized for approximating the data distribution, but not the answers to queries. In this paper, we are concerned with linear sketches that approximate well not only the data but also the answers to the aggregate queries. The quality of approximations is measured using the mean squared and relative errors (MSE and RLE). A query is represented by a column vector q such that its answer is q h. A given set of queries can be represented by an appropriate query matrix Q. We show that the MSE for the queries is minimized when the sketching matrix used to construct a linear sketch of h has as columns the top-k eigenvectors of the query matrix Q. Further, if the query matrix Q corresponds to all range queries of a given extent, then Q has a succinct representation and a universal set of eigenvectors. For the 1–dimensional case, these eigenvectors are precisely the vectors in the Discrete Fourier Transform. Hence, these eigenvectors have a succinct representation. Generalizations to higher dimensions are also given. Because of this succinct representation it is particularly advantageous for maintaining sketches over streaming data. Further, in many instances, there could already be a (linear) sketch of a distribution, maintained over the data for various applications. We show how to extend that sketch so that the MSE for a given set of queries is minimized. This provides a novel method to construct sketches that consider both the data as well as the queries. Using both synthetic and real data, we experimentally demonstrate that our approach delivers significantly smaller errors than various other standard approaches.
更多
查看译文
关键词
approximate query answering,fourier vectors.,linear sketching,circulants
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要