Regularized Evolution for Image Classifier Architecture Search
national conference on artificial intelligence, 2019.
EI
Keywords:
Weibo:
Abstract:
The effort devoted to hand-crafting image classifiers has motivated the use of architecture search to discover them automatically. Reinforcement learning and evolution have both shown promise for this purpose. This study introduces a regularized version of a popular asynchronous evolutionary algorithm. We rigorously compare it to the non-...More
Code:
Data:
Introduction
- Most state-of-the-art image classifier architectures have been manually designed by human experts (Krizhevsky, Sutskever, and Hinton 2012; Szegedy et al 2015; He et al 2016; Huang et al 2017; Hu, Shen, and Sun 2018).
- To speed up the process, researchers have looked into automated methods (Baker et al 2017a; Zoph and Le 2016; Miikkulainen et al 2017; Real et al 2017; Xie and Yuille 2017; Suganuma, Shirakawa, and Nagao 2017; Liu et al 2018a; Pham et al 2018)
- These methods are collectively known as architecture-search algorithms.
Highlights
- Until recently, most state-of-the-art image classifier architectures have been manually designed by human experts (Krizhevsky, Sutskever, and Hinton 2012; Szegedy et al 2015; He et al 2016; Huang et al 2017; Hu, Shen, and Sun 2018)
- Supplement A presents preliminary results, performing evolutionary and reinforcement learning searches over three search spaces (SP-I: same as in the Results section; SP-II: like SP-I but with more possible ops; SP-III: like SPII but with more pairwise combinations) and three datasets, at a small-compute scale
- We proposed aging evolution, a variant of tournament selection by which genotypes die according to their age, favoring the young
- We implemented simple mutations that permit the application of evolution to the popular NASNet search space
- We presented the first controlled comparison of algorithms for image classifier architecture search in a case study of evolution, reinforcement learning and random search
- We showed that evolution had somewhat faster search speed and stood out in the regime of scarcer resources / early stopping
Methods
- This section contains a readable description of the methods. The Methods Details section gives additional information.
Search Space All experiments use the NASNet search space (Zoph et al 2018). - For ImageNet table, N/F were 6/190 and 6/448 and standard training methods (Szegedy et al 2017): distributed sync SGD with 100 P100 GPUs; RMSProp optimizer with 0.9 decay and =0.1, 4 × 10−5 weight decay, 0.1 label smoothing, auxiliary softmax weighted by 0.4; dropout probability 0.5; ScheduledDropPath to 0.7 probability; 0.001 initial lr, decaying every 2 epochs by 0.97.
- The authors used the same conditions as the baseline study
Results
- Comparison With RL and RS Baselines
Currently, reinforcement learning (RL) is the predominant method for architecture search. - When re-trained on ImageNet, AmoebaNet-A performs comparably to the baseline for the same number of parameters (Table 2, model with F=190).
- The authors focused on AmoebaNet-A exclusively and enlarged it, setting a new state-of-the-art accuracy on ImageNet of 83.9%/96.6% top-1/5 accuracy with 469M parameters (Table 2, model with F=448).
- Such high parameter counts may be beneficial in training other models too but the authors have not managed to do this yet
Conclusion
- This section will suggest directions for future work, which the authors will motivate by speculating about the evolutionary process and by summarizing additional minor results.
- The details of these minor results have been relegated to the supplements, as they are not necessary to understand or reproduce the main results above.
- A natural direction for future work is to extend the controlled comparison to more search spaces, datasets, and tasks, to verify generality, or to more algorithms.
- Evolution matched RL in final model quality, employing a simpler method
Summary
Introduction:
Most state-of-the-art image classifier architectures have been manually designed by human experts (Krizhevsky, Sutskever, and Hinton 2012; Szegedy et al 2015; He et al 2016; Huang et al 2017; Hu, Shen, and Sun 2018).- To speed up the process, researchers have looked into automated methods (Baker et al 2017a; Zoph and Le 2016; Miikkulainen et al 2017; Real et al 2017; Xie and Yuille 2017; Suganuma, Shirakawa, and Nagao 2017; Liu et al 2018a; Pham et al 2018)
- These methods are collectively known as architecture-search algorithms.
Methods:
This section contains a readable description of the methods. The Methods Details section gives additional information.
Search Space All experiments use the NASNet search space (Zoph et al 2018).- For ImageNet table, N/F were 6/190 and 6/448 and standard training methods (Szegedy et al 2017): distributed sync SGD with 100 P100 GPUs; RMSProp optimizer with 0.9 decay and =0.1, 4 × 10−5 weight decay, 0.1 label smoothing, auxiliary softmax weighted by 0.4; dropout probability 0.5; ScheduledDropPath to 0.7 probability; 0.001 initial lr, decaying every 2 epochs by 0.97.
- The authors used the same conditions as the baseline study
Results:
Comparison With RL and RS Baselines
Currently, reinforcement learning (RL) is the predominant method for architecture search.- When re-trained on ImageNet, AmoebaNet-A performs comparably to the baseline for the same number of parameters (Table 2, model with F=190).
- The authors focused on AmoebaNet-A exclusively and enlarged it, setting a new state-of-the-art accuracy on ImageNet of 83.9%/96.6% top-1/5 accuracy with 469M parameters (Table 2, model with F=448).
- Such high parameter counts may be beneficial in training other models too but the authors have not managed to do this yet
Conclusion:
This section will suggest directions for future work, which the authors will motivate by speculating about the evolutionary process and by summarizing additional minor results.- The details of these minor results have been relegated to the supplements, as they are not necessary to understand or reproduce the main results above.
- A natural direction for future work is to extend the controlled comparison to more search spaces, datasets, and tasks, to verify generality, or to more algorithms.
- Evolution matched RL in final model quality, employing a simpler method
Tables
- Table1: CIFAR-10 testing set results for AmoebaNet-A, compared to top model reported in the baseline study
- Table2: ImageNet classification results for AmoebaNet-A compared to hand-designs (top rows) and other automated methods
Related work
- Review papers provide informative surveys of earlier (Yao 1999; Floreano, Durr, and Mattiussi 2008) and more recent (Elsken, Metzen, and Hutter 2018) literature on image classifier architecture search, including successful RL studies (Zoph and Le 2016; Baker et al 2017a; Zoph et al 2018; Liu et al 2018a; Zhong, Yan, and Liu 2018; Cai et al 2018) and evolutionary studies like those mentioned in the Introduction. Other methods have also been applied: cascadecorrelation (Fahlman and Lebiere 1990), boosting (Cortes et al 2017), hill-climbing (Elsken, Metzen, and Hutter 2017), MCTS (Negrinho and Gordon 2017), SMBO (Mendoza et al 2016; Liu et al 2018a), and random search (Bergstra and Bengio 2012), and grid search (Zagoruyko and Komodakis 2016). Some methods even forewent the idea of independent architectures (Saxena and Verbeek 2016). There is much architecture-search work beyond image classification too, but that is outside our scope.
Even though some methods stand out due to their efficiency (Suganuma, Shirakawa, and Nagao 2017; Pham et al 2018), many approaches use large amounts of resources. Several recent papers reduced the compute cost through progressive-complexity search stages (Liu et al 2018a), hypernets (Brock et al 2018), accuracy prediction (Baker et al 2017b; Klein et al 2017; Domhan, Springenberg, and Hutter 2017), warm-starting and ensembling (Feurer et al 2015), parallelization, reward shaping and early stopping (Zhong, Yan, and Liu 2018) or Net2Net transformations (Cai et al 2018). Most of these methods could in principle be applied to evolution too, but this is beyond the scope of this paper.
Reference
- Angeline, P. J.; Saunders, G. M.; and Pollack, J. B. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks.
- Baker, B.; Gupta, O.; Naik, N.; and Raskar, R. 2017a. Designing neural network architectures using reinforcement learning. In ICLR.
- Baker, B.; Gupta, O.; Raskar, R.; and Naik, N. 2017b. Accelerating neural architecture search using performance prediction. ICLR Workshop.
- Bergstra, J., and Bengio, Y. 2012. Random search for hyperparameter optimization. JMLR.
- Brock, A.; Lim, T.; Ritchie, J. M.; and Weston, N. 2018. Smash: one-shot model architecture search through hypernetworks. In ICLR.
- Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; and Wang, J. 2018. Efficient architecture search by network transformation. In AAAI.
- Chen, Y.; Li, J.; Xiao, H.; Jin, X.; Yan, S.; and Feng, J. 201Dual path networks. In NIPS.
- Ciregan, D.; Meier, U.; and Schmidhuber, J. 2012. Multicolumn deep neural networks for image classification. In CVPR.
- Coleman, C.; Kang, D.; Narayanan, D.; Nardi, L.; Zhao, T.; Zhang, J.; Bailis, P.; Olukotun, K.; Re, C.; and Zaharia, M. 2018. Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. arXiv preprint arXiv:1806.01427.
- Cortes, C.; Gonzalvo, X.; Kuznetsov, V.; Mohri, M.; and Yang, S. 2017. Adanet: Adaptive structural learning of artificial neural networks. In ICML.
- Cubuk, E. D.; Zoph, B.; Mane, D.; Vasudevan, V.; and Le, Q. V. 2018. Autoaugment: Learning augmentation policies from data. arXiv.
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and FeiFei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.
- Domhan, T.; Springenberg, J. T.; and Hutter, F. 2017. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In IJCAI.
- Elsken, T.; Metzen, J.-H.; and Hutter, F. 2017. Simple and efficient architecture search for convolutional neural networks. ICLR Workshop.
- Elsken, T.; Metzen, J. H.; and Hutter, F. 2018. Neural architecture search: A survey. arXiv.
- Fahlman, S. E., and Lebiere, C. 1990. The cascadecorrelation learning architecture. In NIPS.
- Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; and Hutter, F. 2015. Efficient and robust automated machine learning. In NIPS.
- Floreano, D.; Durr, P.; and Mattiussi, C. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence.
- Goldberg, D. E., and Deb, K. 1991. A comparative analysis of selection schemes used in genetic algorithms. FOGA.
- He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
- Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; and Meger, D. 2018. Deep reinforcement learning that matters. AAAI.
- Hornby, G. S. 2006. Alps: the age-layered population structure for reducing the problem of premature convergence. In GECCO.
- Hu, J.; Shen, L.; and Sun, G. 2018. Squeeze-and-excitation networks. CVPR.
- Huang, G.; Liu, Z.; Weinberger, K. Q.; and van der Maaten, L. 2017. Densely connected convolutional networks. In CVPR.
- Klein, A.; Falkner, S.; Springenberg, J. T.; and Hutter, F. 2017. Learning curve prediction with bayesian neural networks. ICLR.
- Krizhevsky, A., and Hinton, G. 2009. Learning multiple layers of features from tiny images. Master’s thesis, Dept. of Computer Science, U. of Toronto.
- Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.
- Liu, C.; Zoph, B.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; and Murphy, K. 2018a. Progressive neural architecture search. ECCV.
- Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; and Kavukcuoglu, K. 2018b. Hierarchical representations for efficient architecture search. In ICLR.
- Mendoza, H.; Klein, A.; Feurer, M.; Springenberg, J. T.; and Hutter, F. 2016. Towards automatically-tuned neural networks. In Workshop on Automatic Machine Learning.
- Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Navruzyan, A.; Duffy, N.; and Hodjat, B. 2017. Evolving deep neural networks. arXiv.
- Miller, G. F.; Todd, P. M.; and Hegde, S. U. 1989. Designing neural networks using genetic algorithms. In ICGA.
- Negrinho, R., and Gordon, G. 2017. Deeparchitect: Automatically designing and training deep architectures. arXiv.
- Pham, H.; Guan, M. Y.; Zoph, B.; Le, Q. V.; and Dean, J. 2018. Faster discovery of neural architectures by searching for paths in a large model. ICLR Workshop.
- Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y. L.; Le, Q.; and Kurakin, A. 2017. Large-scale evolution of image classifiers. In ICML.
- Salimans, T.; Ho, J.; Chen, X.; and Sutskever, I. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv.
- Saxena, S., and Verbeek, J. 2016. Convolutional neural fabrics. In NIPS.
- Simmons, J. P.; Nelson, L. D.; and Simonsohn, U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science.
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. JMLR.
- Stanley, K. O., and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evol. Comput.
- Stanley, K. O.; Bryant, B. D.; and Miikkulainen, R. 2005. Real-time neuroevolution in the nero video game. TEVC.
- Suganuma, M.; Shirakawa, S.; and Nagao, T. 2017. A genetic programming approach to designing convolutional neural network architectures. In GECCO.
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
- Wan, L.; Zeiler, M.; Zhang, S.; Le Cun, Y.; and Fergus, R. 2013. Regularization of neural networks using dropconnect. In ICML.
- Xie, L., and Yuille, A. 2017. Genetic CNN. In ICCV.
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; and He, K. 2017. Aggregated residual transformations for deep neural networks. In CVPR.
- Yao, X. 1999. Evolving artificial neural networks. IEEE.
- Zagoruyko, S., and Komodakis, N. 2016. Wide residual networks. In BMVC.
- Zhang, X.; Li, Z.; Loy, C. C.; and Lin, D. 2017. Polynet: A pursuit of structural diversity in very deep networks. In CVPR.
- Zhong, Z.; Yan, J.; and Liu, C.-L. 2018. Practical network blocks design with q-learning. In AAAI.
- Zoph, B., and Le, Q. V. 2016. Neural architecture search with reinforcement learning. In ICLR.
- Zoph, B.; Vasudevan, V.; Shlens, J.; and Le, Q. V. 2018. Learning transferable architectures for scalable image recognition. In CVPR.
Tags
Comments