Taming unbalanced training workloads in deep learning with partial collective operations
PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming San Diego California February, 2020, pp. 45-61, 2020.
Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at...More
PPT (Upload PPT)