Optimizing Parameter Mixing Under Constrained Communications in Parallel Federated Learning


引用 0|浏览33
In vanilla Federated Learning (FL) systems, a centralized parameter server (PS) is responsible for collecting, aggregating and distributing model parameters with decentralized clients. However, the communication link of a single PS can be easily overloaded by concurrent communications with a massive number of clients. To overcome this drawback, multiple PSes can be deployed to form a parallel FL (PFL) system, in which each PS only communicates with a subset of clients and its neighbor PSes. On one hand, each PS conducts iterations with clients in its subset. On the other hand, PSes communicate with each other periodically to mix their parameters so that they can finally reach a consensus. In this paper, we propose a novel parallel federated learning algorithm called Fed-PMA, which optimizes such parallel FL under constrained communications by conducting parallel parameter mixing and averaging with theoretic guarantees. We formally analyze the convergence rate of Fed-PMA with convex loss, and further derive the optimal number of times each PS should mix with its neighbor PSes so as to maximize the final model accuracy within a fixed span of training time. Theoretical study manifests that PSes should mix their parameters more frequently if the connection between PSes is sparse or the time cost of mixing is low. Inspired by our analysis, we propose the Fed-APMA algorithm that can adaptively determine the near-optimal number of mixing times with non-convex loss under dynamic communication conditions. Extensive experiments with realistic datasets are carried out to demonstrate that both Fed-PMA and its adaptive version Fed-APMA significantly outperform the state-of-the-art baselines.
AI 理解论文