Computations over data streams

Computations over data streams(2005)

引用 23|浏览11
暂无评分
摘要
Several emerging applications demand computations over streaming data, where data can be seen at most once as they stream by, use small footprint summaries with fast per datum update time. We present novel summaries to estimate aggregate queries over streaming data, and adaptive computations for stream applications based on forgetting factors. Range-sum queries when represented as vectors, can be computed using inner products with the frequency distribution vector and are usually correlated. We present summaries which we call linear sketches, that are linear projections of the frequency distribution vector that exploit the correlation among a given set of query vectors in an optimal manner (in the mean squared error sense) to estimate them effectively. Linear sketches for common sets of range-sum queries were found to be closely related to classical linear transforms such as the Discrete Fourier Transform and the Discrete Sine Transform and can be maintained efficiently over streaming data. Experimental results using both synthetic and real data, demonstrate that our approach delivers significantly smaller errors than various other standard approaches. We provide extensions to multi-dimensional data streams and also show how any linear projection of the frequency distribution vector can be augmented by linear sketches to estimate answers to range-sum queries with smaller errors. Moreover, we consider F2 range queries that compute the second frequency moment or the size of the self-join over a given range. We present and analyze for the first time, stream summaries for estimating F2 range queries based on linear sketches. Stream computations are useful and important in sensor networks. We demonstrate how linear sketches can be used to estimate aggregate range queries in energy-constrained environments such as sensor networks. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks by trading only a small loss in accuracy of the queries. To make stream computations adaptive to the non-stationarities, we use forgetting factors where each element in the data stream has a weight that determines its influence on the result and decays exponentially with time. We present the weighted k-means clustering algorithm using forgetting factors that maintains adaptive clusters over data streams. Further, we use adaptive clusters as buckets of histogram to answer weighted-count range queries. We also present a forgetting factor-based Recursive Least Squares algorithm for adaptive incremental model estimation for detecting outliers and change point over streams.
更多
查看译文
关键词
smaller error,linear projection,linear sketch,adaptive cluster,frequency distribution vector,F2 range,stream computation,sensor network,experimental result,data stream
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要