Marthe: Scheduling the Learning Rate Via Online Hypergradients

IJCAI 2020(2020)

引用 17|浏览41762
暂无评分
摘要
We study the problem of fitting task-specific learning rate schedules from the perspective of hyper-parameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule - the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO [Franceschi et al., 2017] and HD [Baydin et al., 2018], and is able to produce learning rate schedules that are more stable leading to models that generalize better.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要