How Good is Query Optimizer in Spark?
CollaborateCom(2018)
摘要
In the big data community, Spark plays an important role and is used to process interactive queries. Spark employs a query optimizer, called Catalyst, to interpret SQL queries to optimized query execution plans. Catalyst contains a number of optimization rules and supports cost-based optimization. Although query optimization techniques have been well studied in the field of relational database systems, the effectiveness of Catalyst in Spark is still unclear. In this paper, we investigated the effectiveness of rule-based and cost-based optimization in Catalyst, meanwhile, we obtained a set of comparative experiments by varying the data volume and the number of nodes. It is found that even when applied query optimizations, the execution time of most TPC-H queries were slightly reduced. Some interesting observations were made on Catalyst, which can enable the community to have a better understanding and improvement of the query optimizer in Spark.
更多查看译文
关键词
Spark SQL, Catalyst, Query optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络