Decoupled Weight Decay for Any p Norm

CoRR（2024）

引用 0|浏览1

暂无评分

摘要

With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or L_p regularization during training. We introduce a novel weight decay scheme, which generalizes the standard L_2 weight decay to any p norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with 0更多

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要