SpaceSaving: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model.

Proceedings of the VLDB Endowment(2021)

引用 6|浏览18
In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded-deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the boundeddeletion model, and propose Lazy SpaceSaving(+/-) and SpaceSaving(+/-) algorithms with optimal space bound. We develop an efficient implementation of the SpaceSaving(+/-) algorithm that minimizes the latency of update operations using novel data structures. The experimental evaluations testify that SpaceSaving(+/-) has accurate frequency estimations and achieves very high recall and precision across different data distributions while using minimal space. Our experiments clearly demonstrate that, if allowed the same space, SpaceSaving +/- is more accurate than the state-of-the-art protocols with up to logU-1/logU of the items deleted, where.. is the size of the input universe. Moreover, motivated by prior work, we propose Dyadic SpaceSaving(+/-), the first deterministic quantile approximation sketch in the bounded-deletion model.
frequency estimation,frequent items,deletion
AI 理解论文
Chat Paper