Half-Xor: A Fully-Dynamic Sketch for Estimating the Number of Distinct Values in Big Tables

Pinghui Wang, Dongdong Xie,Junzhou Zhao, Jinsong Li, Zhicheng Li,Rundong Li, Yang Ren, Jia Di

IEEE Transactions on Knowledge and Data Engineering(2024)

引用 0|浏览2
暂无评分
摘要
Calculating the number of distinct values (i.e., NDV) in a column of a big table is costly yet fundamental to a variety of database applications such as data compression and profiling. To reduce the high time and space cost, a number of sketch methods (e.g., HyperLogLog) have been proposed, which estimate the NDV from a constructed compact data summary of distinct values. However, these methods fail or are costly to manage fully-dynamic scenarios where data is often inserted into and deleted from the table. To solve this issue, we propose a novel sketch method, Half-Xor . Our Half-Xor sketch consists of a compact bit matrix and a small counter array, and it needs to set a few bits and update a counter when handling a data insertion/deletion. Compared with the state-of-the-art mergeable method, our experimental results demonstrate that our method Half-Xor is up to 6.6 times more accurate under the same memory usage and reduces the memory usage by up to 16 times to achieve the same estimation accuracy.
更多
查看译文
关键词
sketch,data mining,dynamic cardinality estimation,big table,database
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要