Federated Optimization of Smooth Loss Functions

Ali Jadbabaie,Anuran Makur,Devavrat Shah

IEEE TRANSACTIONS ON INFORMATION THEORY（2023）

引用 2|浏览12

暂无评分

摘要

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server seeks to minimize an ERM objective function using n samples of training data that is stored across m clients and the server. The recent flurry of research in this area has identified the Federated Averaging (FedAve) algorithm as the staple for determining & varepsilon;-approximate solutions to the ERM problem. Similar to standard optimization algorithms, e.g., stochastic gradient descent, the convergence analysis of FedAve and its variants only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness in data in a federated learning context, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the gradient of the loss function, our algorithm first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients using its own gradients. Then, our algorithm solves the ERM problem at the server using an inexact gradient descent method. To theoretically demonstrate that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity in the optimization literature. Under some assumptions on the loss function, e.g., strong convexity and smoothness in the parameter, eta-H & ouml;lder class smoothness in the data, etc., we prove that the federated oracle complexity of FedLRGD scales like 4,m(p/& varepsilon;)(Theta(d/eta)) and that of FedAve scales like 4,m(p/& varepsilon;)(3/4 )(neglecting typically sub-dominant factors), where 4, >> 1 is the ratio of client-to-server communication time to gradient computation time, p is the parameter dimension, and d is the data dimension. Then, we show that when d is small compared to n and the loss function is sufficiently smooth in the data, i.e., eta = Theta(d), FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a general result on low rank approximation of smooth latent variable models.

查看译文

关键词

Federated learning,empirical risk minimization,gradient descent,Holder class,low rank approximation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要