Improving Line Search Methods for Large Scale Neural Network Training
2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)(2024)
摘要
In recent studies, line search methods have shown significant improvements in
the performance of traditional stochastic gradient descent techniques,
eliminating the need for a specific learning rate schedule. In this paper, we
identify existing issues in state-of-the-art line search methods, propose
enhancements, and rigorously evaluate their effectiveness. We test these
methods on larger datasets and more complex data domains than before.
Specifically, we improve the Armijo line search by integrating the momentum
term from ADAM in its search direction, enabling efficient large-scale
training, a task that was previously prone to failure using Armijo line search
methods. Our optimization approach outperforms both the previous Armijo
implementation and tuned learning rate schedules for Adam. Our evaluation
focuses on Transformers and CNNs in the domains of NLP and image data. Our work
is publicly available as a Python package, which provides a hyperparameter free
Pytorch optimizer.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要