Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
arxiv(2024)
摘要
Sharpness-aware Minimization (SAM) has been proposed recently to improve
model generalization ability. However, SAM calculates the gradient twice in
each optimization step, thereby doubling the computation costs compared to
stochastic gradient descent (SGD). In this paper, we propose a simple yet
efficient sampling method to significantly accelerate SAM. Concretely, we
discover that the gradient of SAM is a combination of the gradient of SGD and
the Projection of the Second-order gradient matrix onto the First-order
gradient (PSF). PSF exhibits a gradually increasing frequency of change during
the training process. To leverage this observation, we propose an adaptive
sampling method based on the variation of PSF, and we reuse the sampled PSF for
non-sampling iterations. Extensive empirical results illustrate that the
proposed method achieved state-of-the-art accuracies comparable to SAM on
diverse network architectures.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要