Optimizing Parameter Mixing Under Constrained Communications in Parallel Federated Learning

IEEE-ACM TRANSACTIONS ON NETWORKING(2023)

引用 0|浏览43
暂无评分
摘要
In vanilla Federated Learning (FL) systems, a centralized parameter server (PS) is responsible for collecting, aggregating and distributing model parameters with decentralized clients. However, the communication link of a single PS can be easily overloaded by concurrent communications with a massive number of clients. To overcome this drawback, multiple PSes can be deployed to form a parallel FL (PFL) system, in which each PS only communicates with a subset of clients and its neighbor PSes. On one hand, each PS conducts iterations with clients in its subset. On the other hand, PSes communicate with each other periodically to mix their parameters so that they can finally reach a consensus. In this paper, we propose a novel parallel federated learning algorithm called Fed-PMA, which optimizes such parallel FL under constrained communications by conducting parallel parameter mixing and averaging with theoretic guarantees. We formally analyze the convergence rate of Fed-PMA with convex loss, and further derive the optimal number of times each PS should mix with its neighbor PSes so as to maximize the final model accuracy within a fixed span of training time. Theoretical study manifests that PSes should mix their parameters more frequently if the connection between PSes is sparse or the time cost of mixing is low. Inspired by our analysis, we propose the Fed-APMA algorithm that can adaptively determine the near-optimal number of mixing times with non-convex loss under dynamic communication conditions. Extensive experiments with realistic datasets are carried out to demonstrate that both Fed-PMA and its adaptive version Fed-APMA significantly outperform the state-of-the-art baselines.
更多
查看译文
关键词
Parallel federated learning,optimal mixing times,convergence rate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要