Memory-constrained aggregate computation over data streams

Data Engineering(2011)

引用 19|浏览0
暂无评分
摘要
In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested instantiating certain intermediate aggregates which are then used to generate the final answers for input queries. In this work, we make a number of important contributions aimed at improving the execution and generation of query plans containing intermediate aggregates. These include: (1) a different hashing model, which has low eviction rates, and also allows us to accurately estimate the number of evictions, (2) a comprehensive query execution cost model based on these estimates, (3) an efficient greedy heuristic for constructing good low-cost query plans, (4) provably near-optimal and optimal algorithms for allocating the available memory to aggregates in the query plan when the input data distribution is Zipf-like and Uniform, respectively, and (5) a detailed performance study with real-life IP flow data sets, which show that our multiple aggregates computation techniques consistently outperform the best-known approach.
更多
查看译文
关键词
good low-cost query plan,comprehensive query execution cost,certain intermediate aggregate,real-life ip flow data,multiple aggregation query,input query,computation technique,query plan,memory-constrained aggregate computation,data stream,input data distribution,data model,resource management,data streams,computer model,data handling,data models,greedy heuristic,computational modeling,memory management,resource manager
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要