Understanding the Training Dynamics in Federated Deep Learning via Aggregation Weight Optimization

ICLR 2023(2023)

Cited 0|Views34
No score
Abstract
From the server's perspective, federated learning (FL) learns a global model by iteratively sampling a cohort of clients and updating the global model with the sum local gradient of the cohort. We find this process is analogous to mini-batch SGD of centralized training. In mini-batch SGD, a model is learned by iteratively sampling a batch of data and updating the model with the sum gradient of the batch. In this paper, we delve into the training dynamics in FL by learning from the experience of optimization and generalization in mini-batch SGD. Specifically, we focus on two aspects: \emph{client coherence} (refers to sample coherence in mini-batch SGD) and \emph{global weight shrinking regularization} (refers to weight decay in mini-batch SGD). We find the roles of the two aspects are both determined by the aggregation weights assigned to each client during global model updating. Thus, we use aggregation weight optimization on the server as a tool to study how client heterogeneity and the number of local epochs affect the global training dynamics in FL. Besides, we propose an effective method for \textbf{Fed}erated \textbf{A}ggregation \textbf{W}eight \textbf{O}ptimization, named as \textsc{\textbf{FedAWO}}. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.
More
Translated text
Key words
Federated learning,deep learning,weighted aggregation,training dynamics,optimization,neural network.
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined