Compact Summaries Over Large Datasets
MOD(2015)
摘要
A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the count distinct algorithms, which compactly approximate item set cardinalities, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. This tutorial introduces the concepts and examples of compact summaries.
更多查看译文
关键词
summaries,sketches,approximate counting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络