Exploiting Duality In Summarization With Deterministic Guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining(2007)

引用 52|浏览20
暂无评分
摘要
Summarization is an important task in data raining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a B log(2) n/log epsilon*, factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log(2) B/log epsilon* + log n in time and B(1 - log B/log n) in space, where epsilon* is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.
更多
查看译文
关键词
efficiency,histograms,synopses,wavelets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要