Taming unbalanced training workloads in deep learning with partial collective operations

PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming San Diego California February, 2020, pp. 45-61, 2020.

Cited by: 5|Bibtex|Views34|Links
EI

Abstract:

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at...More

Code:

Data:

Your rating :
0

 

Tags
Comments