Algebricks: A Data Model-Agnostic Compiler Backend For Big Data Languages
SoCC '15: ACM Symposium on Cloud Computing Kohala Coast Hawaii August, 2015(2015)
摘要
A number of high-level query languages, such as Hive, Pig, Flume, and Jaql, have been developed in recent years to increase analyst productivity when processing and analyzing very large datasets. The implementation of each of these languages includes a complete, data model-dependent query compiler, yet each involves a number of similar optimizations. In this work, we describe a new query compiler architecture that separates language-specific and data model-dependent aspects from a more general query compiler backend that can generate executable data-parallel programs for shared-nothing clusters and can be used to develop multiple languages with different data models. We have built such a data model-agnostic query compiler substrate, called Algebricks, and have used it to implement three different query languages - HiveQL, AQL, and XQuery - to validate the efficacy of this approach. Experiments show that all three query languages benefit from the parallelization and optimization that Algebricks provides and thus have good parallel speedup and scaleup characteristics for large datasets.
更多查看译文
关键词
Big Data,Query Languages,Parallel Query Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络