Sign Based Derivative Filtering For Stochastic Gradient Descent

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II(2019)

引用 2|浏览20
暂无评分
摘要
We study the performance of stochastic gradient descent (SGD) in deep neural network (DNN) models. We show that during a single training epoch the signs of the partial derivatives of the loss with respect to a single parameter are distributed almost uniformly over the minibatches. We propose an optimization routine, where we maintain a moving average history of the sign of each derivative. This history is used to classify new derivatives as "exploratory" if they disagree with the sign of the history. Conversely, we classify the new derivatives as "exploiting" if they agree with the sign of the history. Each derivative is weighed according to our classification, providing control over exploration and exploitation. The proposed approach leads to training a model with higher accuracy as we demonstrate through a series of experiments.
更多
查看译文
关键词
Optimization, Gradients, Deep learning, Neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要