Straggler Exploitation in Distributed Computing Systems with Task Grouping

2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton)(2023)

引用 0|浏览1
暂无评分
摘要
We consider the problem of stragglers in distributed computing systems. Stragglers are compute nodes that unpredictably become slow. This often increases the completion times of tasks. One popular method for mitigating stragglers is to replicate work. Out of all replications of a task, only the first to complete is accepted. The rest are discarded. Discarding work completed by some of the workers leads to wastage of resources. In this paper, we propose a method for exploiting the work completed by stragglers rather than discarding it. We increase the granularity of the assigned work and the frequency of worker updates. We demonstrate that the proposed method reduces the completion time of tasks via experiments performed on a simulated cluster.
更多
查看译文
关键词
distributed systems,stragglers,MapReduce
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要