SQL-SA for big data discovery polymorphic and parallelizable SQL user-defined scalar and aggregate infrastructure in Teradata Aster 6.20

Xin Tang, Robert Wehrmeister, James Shau,Abhirup Chakraborty,Daley Alex, Awny Al Omari,Feven Atnafu,Jeff Davis, Litao Deng, Deepak Jaiswal,Chittaranjan Keswani,Yafeng Lu, Chao Ren, Tom Reyes, Kashif Siddiqui,David Simmen, Devendra Vidhani, Ling Wang, Shuai Yang, Daniel Yu

2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)(2016)

引用 26|浏览59
暂无评分
摘要
There is increasing demand to integrate big data analytic systems using SQL. Given the vast ecosystem of SQL applications, enabling SQL capabilities allows big data platforms to expose their analytic potential to a wide variety of end users, accelerating discovery processes and providing significant business value. Most existing big data frameworks are based on one particular programming model such as MapReduce or Graph. However, data scientists are often forced to manually create adhoc data pipelines to connect various big data tools and platforms to serve their analytic needs. When the analytic tasks change, these data pipelines may be costly to modify and maintain. In this paper we present SQL-SA, a polymorphic and parallelizable SQL scalar and aggregate infrastructure in Aster 6.20. This infrastructure extends Aster 6's MapReduce and Graph capabilities to support polymorphic user-defined scalar and aggregate functions using flexible SQL syntax. The implementation enhances main Aster components including query syntax, API, planning and execution extensively. Integrating these new user-defined scalar and aggregate functions with Aster MapReduce and Graph functions, Aster 6.20 enables data scientists to integrate diverse programming models in a single SQL statement. The statement is automatically converted to an optimal data pipeline and executed in parallel. Using a real world business problem and data, Aster 6.20 demonstrates a significant performance advantage (25%+) over Hadoop Pig and Hive.
更多
查看译文
关键词
SQL-SA,Big Data discovery,Teradata Aster 6.20,Big Data analytic systems,MapReduce,graph capabilities,aggregate functions,scalar functions,query syntax,API
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要