SaPus: Self-Adaptive Parameter Update Strategy for DNN Training on Multi-GPU Clusters

IEEE Transactions on Parallel and Distributed Systems（2022）

引用 3|浏览3

暂无评分

摘要

Parameter server architecture has been identified as an efficient framework for scaling DNNs training on clusters. For large-scale deployment, communication becomes the bottleneck, and the parameter updating strategy strongly impacts the training performance and accuracy. Recent state-of-art solutions have adopted the local SGD approach, which enables workers to update their local version of model...

查看译文

关键词

Training,Servers,Delays,Graphics processing units,Computer architecture,Bandwidth,System performance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要