OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning
PARALLEL COMPUTING(2023)
摘要
The communication bottleneck has severely restricted the scalability of distributed deep learning. Tensor fusion improves the scalability of data parallelism by overlapping computation and communication tasks. However, existing tensor fusion schemes only result in suboptimal training performance. In this paper, we propose an efficient communication mechanism (OF-WFBP) to find the optimal tensor fusion scheme for synchronous data parallelism. We present the mathematical model of OF-WFBP and prove it is an NP-hard problem. We mathematically solve the mathematical model of OF-WFBP in two cases. We propose an improved sparrow search algorithm (GradSSA) to find the near-optimal tensor fusion scheme efficiently in other cases. Experimental results on two different GPU clusters show that OF-WFBP achieves up to 1.43x speedup compared to the state-of-the-art tensor fusion mechanisms.
更多查看译文
关键词
Distributed deep learning,Data parallelism,Tensor fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要