Marthe: Scheduling the Learning Rate Via Online Hypergradients

Michele Donini,Luca Franceschi,Orchid Majumder,Massimiliano Pontil,Paolo Frasconi

IJCAI 2020（2020）

引用 17|浏览41762

暂无评分

摘要

We study the problem of fitting task-specific learning rate schedules from the perspective of hyper-parameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule - the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO [Franceschi et al., 2017] and HD [Baydin et al., 2018], and is able to produce learning rate schedules that are more stable leading to models that generalize better.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要