Robust Sketching and Aggregation of Distributed Data Streams

msra(2005)

引用 33|浏览22
暂无评分
摘要
The data streaming modelprovides an at- tractive framework for one-pass summariza- tion of massive data sets at a single obser- vation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be ag- gregated through a hierarchy before queries may be conducted. As a result, many sketch- based methods for the single stream case do not apply directly, as either the error intro- duced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practicalprobl ems in net- work traffic monitoring and aggregation in sensor networks. To address this, we develop a framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with an ex- perimental evaluation to validate our analysis.
更多
查看译文
关键词
sensor network,technical report
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要