SMBSP: A Self-Tuning Approach using Machine Learning to Improve Performance of Spark in Big Data Processing

2018 7th International Conference on Computer and Communication Engineering (ICCCE)(2018)

引用 6|浏览5
暂无评分
摘要
Apache Spark, popularly known for big data processing capability, is a distributed open-source platform that uses the concept of distributed memory to facilitate big data processing proficiently. From the aspect of performance, it is still a big challenge to obtain the best output from Spark, since the Spark configuration settings with large parameters configuration affect its performance at large extent. Spark has over 180 parameters which control the system performance. These parameters have default values, which lie in a range. User can manually select the suitable values for each parameter. Improper choice of the parameter value leads to poor performance. Manual tuning of the parameters in Hadoop-Spark system requires user to have in-depth knowledge on the system. Because of large parameter space, manual tuning is very time consuming and inefficient. Retuning of the parameters may be required for each different application. This paper propose and developed an effective, self-tuning approach, namely SMBSP, based on Artificial Neural Network (ANN) to avoid the drawbacks of manual tuning of parameters. Dell Poweredge R720 server has been utilized with 5 different sizes of dataset to implement the approach. Furthermore, this approach is found to speed-up the performance of the Spark system by 35% (on an average) compared with default parameter configuration.
更多
查看译文
关键词
Apache Spark,Artificial Intelligence,big data,configuration parameters,self-tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要