Stream Aggregation Through Order Sampling.

CIKM(2017)

引用 11|浏览52
暂无评分
摘要
This paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and efficient method for weighted sampling from a stream of uniquely keyed items, there is no current algorithm that realizes the benefits of order sampling in the context of stream aggregation over non-unique keys. A naive approach to order sample regardless of key then aggregate the results is hopelessly inefficient. In distinction, our proposed algorithm uses a single persistent random variable across the lifetime of each key in the cache, and maintains unbiased estimates of the key aggregates that can be queried at any point in the stream. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. This approach represents a considerable reduction in computational complexity compared with the state of the art in adapting Sample and Hold to operate with a fixed cache size. Concerning statistical properties, we prove that PBA provides unbiased estimates of the true aggregates. We analyze the computational complexity of PBA and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queries.
更多
查看译文
关键词
Priority Sampling, Aggregation, Subset Sums, Heavy Hitters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要