DSAG: A Mixed Synchronous-Asynchronous Iterative Method for Straggler-Resilient Learning

Albin Severinson,Eirik Rosnes,Salim El Rouayheb,Alexandre Graell i Amat

IEEE Transactions on Communications（2023）

引用 1|浏览22

暂无评分

摘要

We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that captures this behavior and is substantiated by traces collected on Microsoft Azure, Amazon Web Services (AWS), and a small local cluster. Building on this model, we propose DSAG, a mixed synchronous-asynchronous iterative optimization method, based on the stochastic average gradient (SAG) method, that combines timely and stale results. We also propose a dynamic load-balancing strategy to further reduce the impact of straggling workers. We evaluate DSAG for principal component analysis, cast as a finite-sum optimization problem, of a large genomics dataset, and for logistic regression on a cluster composed of 100 workers on AWS, and find that DSAG is up to about 50% faster than SAG, and more than twice as fast as coded computing methods, for the particular scenario that we consider.

查看译文

关键词

Coded computing,iterative optimization,load-balancing,principal component analysis (PCA),stochastic average gradient (SAG),straggler mitigation,variance reduction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要