Decoupled Weight Decay for Any p Norm

CoRR(2024)

引用 0|浏览1
暂无评分
摘要
With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or L_p regularization during training. We introduce a novel weight decay scheme, which generalizes the standard L_2 weight decay to any p norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with 0更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要