The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure
arXiv: Learning, 2019.
There is a stark disparity between the step size schedules used in practical large scale machine learning and those that are considered optimal by the theory of stochastic approximation. In theory, most results utilize polynomially decaying learning rate schedules, while, in practice, the Step Decay schedule is among the most popular sche...More
PPT (Upload PPT)