OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Yunqi Gao, Zechao Zhang,Bing Hu,A-Long Jin,Chunming Wu

PARALLEL COMPUTING(2023)

引用 0|浏览3
暂无评分
摘要
The communication bottleneck has severely restricted the scalability of distributed deep learning. Tensor fusion improves the scalability of data parallelism by overlapping computation and communication tasks. However, existing tensor fusion schemes only result in suboptimal training performance. In this paper, we propose an efficient communication mechanism (OF-WFBP) to find the optimal tensor fusion scheme for synchronous data parallelism. We present the mathematical model of OF-WFBP and prove it is an NP-hard problem. We mathematically solve the mathematical model of OF-WFBP in two cases. We propose an improved sparrow search algorithm (GradSSA) to find the near-optimal tensor fusion scheme efficiently in other cases. Experimental results on two different GPU clusters show that OF-WFBP achieves up to 1.43x speedup compared to the state-of-the-art tensor fusion mechanisms.
更多
查看译文
关键词
Distributed deep learning,Data parallelism,Tensor fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要