Algorithmic Aspects of Parallel Query Processing.

SIGMOD/PODS '18: International Conference on Management of Data Houston TX USA June, 2018(2018)

引用 3|浏览29
暂无评分
摘要
In the last decade we have witnessed a growing interest in process- ing large data sets on large-scale distributed clusters. A big part of the complex data analysis pipelines performed by these systems consists of a sequence of relatively simple query operations, such as joining two or more tables, or sorting. This tutorial discusses several recent algorithmic developments for data processing in such large distributed clusters. It uses as a model of computation the Massively Parallel Computation (MPC) model, a simplification of the BSP model, where the only cost is given by the amount of communication and the number of communication rounds. Based on the MPC model, we study and analyze several algorithms for three core data processing tasks: multiway join queries, sorting and matrix multiplication. We discuss the common algorithmic techniques across all tasks, relate the algorithms to what is used in practical systems, and finally present open problems for future research.
更多
查看译文
关键词
Distributed Query Evaluation,Bulk Synchronous Parallel Model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要