Leader Population Learning Rate Schedule

Jia Wei,Xingjun Zhang, Zhimin Zhuo,Zeyu Ji,Zheng Wei,Jingbo Li, Qianyang Li

Information Sciences（2022）

引用 4|浏览8

暂无评分

摘要

The successful application of modern deep neural network models is deeply influenced by the choice of hyperparameters. As one of the most important hyperparameters, the Learning Rate (LR) needs to be fine-tuned. In order to train models that can perform better, finding a suitable LR schedule quickly has become a pressing problem. Current research often relies on manual use of parallel search methods such as grid search, or serial methods such as Bayesian. However, the variety and number of models and datasets result in a very large LR space, and these methods consume significant resources to find only sub-optimal static LR schedules within a given time frame. It was found that in the currently widely used Distributed Data Parallel (DDP) deep learning, where multiple nodes themselves constitute the population, it is possible to use population algorithms to optimise the searched LR schedule dynamically throughout the training process with little additional training time. To maximise the potential of both the searched LR schedule and the population optimisation of the participating multi-nodes, this paper proposes a Leader Population Learning Rate Schedule (LPLRS) for the DDP deep learning environment, which continuously explores for better learning rates in the neighbourhood of the searched LR schedule and guides the subsequent training process. LPLRS achieves higher model classification accuracy on the Cifar 10&100 test datasets on the state-of-the-art Wide Residual Network with Sharpness-Aware Minimization (WRN(SAM)) model compared to the latest SGDR, CLR, and StepLR learning rate schedules.

查看译文

关键词

Deep learning,Distributed training,Hyperparameter search,Data parallel,Population algorithm,Heuristic algorithms

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要