ROBOTune - High-Dimensional Configuration Tuning for Cluster-Based Data Analytics.

ICPP(2021)

引用 6|浏览11
暂无评分
摘要
Spark is popular for its ability to enable high-performance data analytics applications on diverse systems. Its great versatility is achieved through numerous user- and system-level options, resulting in an exponential configuration space that, ironically, hinders data analytics's optimal performance. The colossal complexity is caused by two main issues: the high dimensionality of configuration space and the expensive black-box configuration-performance relationship. In this paper, we design and develop a robust tuning framework called ROBOTune that can tackle both issues and tune Spark applications quickly for efficient data analytics. Specifically, it performs parameter selection through a Random Forests based model to reduce the dimensionality of analytics configuration space. In addition, ROBOTune employs Bayesian Optimization to overcome the complex nature of the configuration-performance relationship and balance exploration and exploitation to efficiently locate a globally optimal or near-optimal configuration. Furthermore, ROBOTune strengthens Latin Hypercube Sampling with caching and memoization to enhance the coverage and effectiveness in the generation of sample configurations. Our evaluation results demonstrate that ROBOTune finds similar or better performing configurations than contemporary tuning tools like BestConfig and Gunther while improving on search cost by 1.59x and 1.53x on average and up to 2.27x and 1.71x, respectively.
更多
查看译文
关键词
Performance Tuning, Bayesian Optimization, Spark Configurations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要