A Fast, Scalable, Universal Approach For Distributed Data Aggregations

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2020)

引用 4|浏览10
暂无评分
摘要
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylon's fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.
更多
查看译文
关键词
HPC, Data Engineering, Aggregations, Relational Algebra, Big Data, Reductions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要