ABC: A practicable sketch framework for non-uniform multisets

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2017)

引用 8|浏览16
暂无评分
摘要
Sketch is a data structure used to record frequencies of items in a multiset, which is widely used in data streams, data graph, distributed datasets processing, etc. It works with small memory usage and a high speed at the cost of a slight inaccuracy. In practice, frequencies of items in many datasets are non-uniformly distributed. Unfortunately, existing sketches can hardly work well on non-uniform datasets. To address this issue, we propose a new sketch framework, namely ABC framework, which can be applied to most existing sketches and can significantly improve the accuracy on non-uniform datasets. The key idea behind our framework is that when a counter overflows, it makes use of the space from the adjacent counters by operations of bits-borrowing and combination. Extensive experimental results show that our ABC framework improves the accuracy by 4.10 times and 4.49 times in average, respectively. A demo and all the related source codes are available on our homepage [1].
更多
查看译文
关键词
Data Structure, Sketch, Data Streams, Non-uniform Datasets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要