COLA: A cloud-based system for online aggregation

ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)(2013)

引用 10|浏览0
暂无评分
摘要
Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on massive data. To process large datasets on large-scale computing clusters, MapReduce has been introduced as a popular paradigm into many data analysis applications. However, typical MapReduce implementations are not well-suited to analytic tasks, since they are geared towards batch processing. With the increasing popularity of ad-hoc analytic query processing over enormous datasets, processing aggregate queries using MapReduce in an online fashion is therefore an emerging important application need. We present a MapReduce-based online aggregation system called COLA, which provides progressive approximate aggregate answers for both single table and multiple joined tables. COLA provides an online aggregation execution engine with novel sampling techniques to support incremental and continuous computing of aggregation, and minimize the waiting time before an acceptably precise estimate is available. In addition, user-friendly SQL queries are supported in COLA. Furthermore, COLA can implicitly convert non-OLA jobs into online version so that users don't have to write any special-purpose code to make estimates.
更多
查看译文
关键词
MapReduce-based online aggregation system,Online aggregation,online aggregation execution engine,online fashion,online version,ad-hoc analytic query processing,batch processing,typical MapReduce implementation,aggregate query,analytic task,cloud-based system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要