Decoupled Weight Decay for Any p Norm
CoRR(2024)
摘要
With the success of deep neural networks (NNs) in a variety of domains, the
computational and storage requirements for training and deploying large NNs
have become a bottleneck for further improvements. Sparsification has
consequently emerged as a leading approach to tackle these issues. In this
work, we consider a simple yet effective approach to sparsification, based on
the Bridge, or L_p regularization during training. We introduce a novel
weight decay scheme, which generalizes the standard L_2 weight decay to any
p norm. We show that this scheme is compatible with adaptive optimizers, and
avoids the gradient divergence associated with 0
更多
查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要