Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Cited by: 0|Bibtex|Views25|Links

Abstract:

Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iteration...More

Code:

Data:

Your rating :
0

 

Tags
Comments