Improving MapReduce Speculative Executions with Global Snapshots

Social Science Research Network(2023)

引用 0|浏览0
暂无评分
摘要
Hadoop's MapReduce implementation has been employed for distributed storage and computation. Although efficient for parallelizing large-scale data processing, the chal-lenge of handling poor-performing jobs persists. Hadoop does not fix straggler tasks but instead launches equivalent tasks (also called a backup task). This process is called Speculative Execution in Hadoop. Current speculative execution approaches face challenges like incorrect estimation of tasks run times, high consumption of system resources and inappropriate selection of backup tasks. In this paper, we propose a new speculative execution approach, which determines task run times with consistent global snapshots and K-Means clustering. Task run times are captured during data processing. Two categories of tasks (i.e. fast and stragglers) are detected with K-Means clustering. A silhouette score is applied as decision tool to determine when to process backup tasks, and to prevent extra iterations of K-Means. This helped to reduce the overhead incurred in applying our approached. We evaluated our approach on different data centre configurations with two objectives: i) the overheads caused by implementing our approach and ii) job performance improvements. Our results showed that i) the overheads caused by applying our approach is becoming more negligible as data centre sizes increase. The overheads reduced by 1.9%, 1.5% and 1.3% (comparatively) as the size of the data centre and the task run times increased, ii) longer mapper tasks runs have better chances for improvements, regardless of the amount of straggler tasks. The graphs of the longer mappers were below 10% relative to the disruptions introduced. This showed that the effects of the disruptions were reduced and became more negligible, while there was more improvement in job performance.
更多
查看译文
关键词
MapReduce,Hadoop,speculative executions,strag-glers,consistent global snapshots,K-means algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要