Over-parameterized Model Optimization with Polyak-Łojasiewicz Condition.

Yixuan Chen 0003,Yubin Shi,Mingzhi Dong,Xiaochen Yang,Dongsheng Li 0002,Yujiang Wang 0001,Robert P. Dick,Qin Lv,Yingying Zhao,Fan Yang 0001,Ning Gu,Li Shang

ICLR 2023（2023）

引用 0|浏览38

暂无评分

摘要

This work pursues the optimization of over-parameterized deep models for superior training efficiency and test performance. We first theoretically emphasize the importance of two properties of over-parameterized models, i.e., the convergence gap and the generalization gap. Subsequent analyses unveil that these two gaps can be upper-bounded by the ratio of the Lipschitz constant and the Polyak-{\L}ojasiewicz (PL) constant, a crucial term abbreviated as the \emph{condition number}. Such discoveries have led to a structured pruning method with a novel pruning criterion. That is, we devise a gating network that dynamically detects and masks out those poorly-behaved nodes of a deep model during the training session. To this end, this gating network is learned via minimizing the \emph{condition number} of the target model, and this process can be implemented as an extra regularization loss term. Experimental studies demonstrate that the proposed method outperforms the baselines in terms of both training efficiency and test performance, exhibiting the potential of generalizing to a variety of deep network architectures and tasks.

查看译文

关键词

Over-parameterized Model,Model Optimization,Polyak-{\L}ojasiewicz Condition.

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要