Regularized Evolution for Image Classifier Architecture Search

    Esteban Real
    Esteban Real
    Alok Aggarwal
    Alok Aggarwal

    national conference on artificial intelligence, 2019.

    Cited by: 49|Bibtex|Views61|Links
    EI
    Keywords:
    aging evolutionmeta learningsearch spacerandom searchevolutionMore(22+)
    Wei bo:
    A variant of tournament selection by which genotypes die according to their age, favoring the young

    Abstract:

    The effort devoted to hand-crafting image classifiers has motivated the use of architecture search to discover them automatically. Reinforcement learning and evolution have both shown promise for this purpose. This study introduces a regularized version of a popular asynchronous evolutionary algorithm. We rigorously compare it to the non-...More

    Code:

    Data:

    Introduction
    Highlights
    • Until recently, most state-of-the-art image classifier architectures have been manually designed by human experts (Krizhevsky, Sutskever, and Hinton 2012; Szegedy et al 2015; He et al 2016; Huang et al 2017; Hu, Shen, and Sun 2018)
    • Supplement A presents preliminary results, performing evolutionary and reinforcement learning searches over three search spaces (SP-I: same as in the Results section; SP-II: like SP-I but with more possible ops; SP-III: like SPII but with more pairwise combinations) and three datasets, at a small-compute scale
    • We proposed aging evolution, a variant of tournament selection by which genotypes die according to their age, favoring the young
    • We implemented simple mutations that permit the application of evolution to the popular NASNet search space
    • We presented the first controlled comparison of algorithms for image classifier architecture search in a case study of evolution, reinforcement learning and random search
    • We showed that evolution had somewhat faster search speed and stood out in the regime of scarcer resources / early stopping
    Methods
    • This section contains a readable description of the methods. The Methods Details section gives additional information.

      Search Space All experiments use the NASNet search space (Zoph et al 2018).
    • For ImageNet table, N/F were 6/190 and 6/448 and standard training methods (Szegedy et al 2017): distributed sync SGD with 100 P100 GPUs; RMSProp optimizer with 0.9 decay and =0.1, 4 × 10−5 weight decay, 0.1 label smoothing, auxiliary softmax weighted by 0.4; dropout probability 0.5; ScheduledDropPath to 0.7 probability; 0.001 initial lr, decaying every 2 epochs by 0.97.
    • The authors used the same conditions as the baseline study
    Results
    • Comparison With RL and RS Baselines

      Currently, reinforcement learning (RL) is the predominant method for architecture search.
    • When re-trained on ImageNet, AmoebaNet-A performs comparably to the baseline for the same number of parameters (Table 2, model with F=190).
    • The authors focused on AmoebaNet-A exclusively and enlarged it, setting a new state-of-the-art accuracy on ImageNet of 83.9%/96.6% top-1/5 accuracy with 469M parameters (Table 2, model with F=448).
    • Such high parameter counts may be beneficial in training other models too but the authors have not managed to do this yet
    Conclusion
    • This section will suggest directions for future work, which the authors will motivate by speculating about the evolutionary process and by summarizing additional minor results.
    • The details of these minor results have been relegated to the supplements, as they are not necessary to understand or reproduce the main results above.
    • A natural direction for future work is to extend the controlled comparison to more search spaces, datasets, and tasks, to verify generality, or to more algorithms.
    • Evolution matched RL in final model quality, employing a simpler method
    Summary
    • Introduction:

      Most state-of-the-art image classifier architectures have been manually designed by human experts (Krizhevsky, Sutskever, and Hinton 2012; Szegedy et al 2015; He et al 2016; Huang et al 2017; Hu, Shen, and Sun 2018).
    • To speed up the process, researchers have looked into automated methods (Baker et al 2017a; Zoph and Le 2016; Miikkulainen et al 2017; Real et al 2017; Xie and Yuille 2017; Suganuma, Shirakawa, and Nagao 2017; Liu et al 2018a; Pham et al 2018)
    • These methods are collectively known as architecture-search algorithms.
    • Methods:

      This section contains a readable description of the methods. The Methods Details section gives additional information.

      Search Space All experiments use the NASNet search space (Zoph et al 2018).
    • For ImageNet table, N/F were 6/190 and 6/448 and standard training methods (Szegedy et al 2017): distributed sync SGD with 100 P100 GPUs; RMSProp optimizer with 0.9 decay and =0.1, 4 × 10−5 weight decay, 0.1 label smoothing, auxiliary softmax weighted by 0.4; dropout probability 0.5; ScheduledDropPath to 0.7 probability; 0.001 initial lr, decaying every 2 epochs by 0.97.
    • The authors used the same conditions as the baseline study
    • Results:

      Comparison With RL and RS Baselines

      Currently, reinforcement learning (RL) is the predominant method for architecture search.
    • When re-trained on ImageNet, AmoebaNet-A performs comparably to the baseline for the same number of parameters (Table 2, model with F=190).
    • The authors focused on AmoebaNet-A exclusively and enlarged it, setting a new state-of-the-art accuracy on ImageNet of 83.9%/96.6% top-1/5 accuracy with 469M parameters (Table 2, model with F=448).
    • Such high parameter counts may be beneficial in training other models too but the authors have not managed to do this yet
    • Conclusion:

      This section will suggest directions for future work, which the authors will motivate by speculating about the evolutionary process and by summarizing additional minor results.
    • The details of these minor results have been relegated to the supplements, as they are not necessary to understand or reproduce the main results above.
    • A natural direction for future work is to extend the controlled comparison to more search spaces, datasets, and tasks, to verify generality, or to more algorithms.
    • Evolution matched RL in final model quality, employing a simpler method
    Tables
    • Table1: CIFAR-10 testing set results for AmoebaNet-A, compared to top model reported in the baseline study
    • Table2: ImageNet classification results for AmoebaNet-A compared to hand-designs (top rows) and other automated methods
    Download tables as Excel
    Related work
    Reference
    • Angeline, P. J.; Saunders, G. M.; and Pollack, J. B. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks.
      Google ScholarLocate open access versionFindings
    • Baker, B.; Gupta, O.; Naik, N.; and Raskar, R. 2017a. Designing neural network architectures using reinforcement learning. In ICLR.
      Google ScholarFindings
    • Baker, B.; Gupta, O.; Raskar, R.; and Naik, N. 2017b. Accelerating neural architecture search using performance prediction. ICLR Workshop.
      Google ScholarFindings
    • Bergstra, J., and Bengio, Y. 2012. Random search for hyperparameter optimization. JMLR.
      Google ScholarLocate open access versionFindings
    • Brock, A.; Lim, T.; Ritchie, J. M.; and Weston, N. 2018. Smash: one-shot model architecture search through hypernetworks. In ICLR.
      Google ScholarFindings
    • Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; and Wang, J. 2018. Efficient architecture search by network transformation. In AAAI.
      Google ScholarFindings
    • Chen, Y.; Li, J.; Xiao, H.; Jin, X.; Yan, S.; and Feng, J. 201Dual path networks. In NIPS.
      Google ScholarFindings
    • Ciregan, D.; Meier, U.; and Schmidhuber, J. 2012. Multicolumn deep neural networks for image classification. In CVPR.
      Google ScholarFindings
    • Coleman, C.; Kang, D.; Narayanan, D.; Nardi, L.; Zhao, T.; Zhang, J.; Bailis, P.; Olukotun, K.; Re, C.; and Zaharia, M. 2018. Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. arXiv preprint arXiv:1806.01427.
      Findings
    • Cortes, C.; Gonzalvo, X.; Kuznetsov, V.; Mohri, M.; and Yang, S. 2017. Adanet: Adaptive structural learning of artificial neural networks. In ICML.
      Google ScholarFindings
    • Cubuk, E. D.; Zoph, B.; Mane, D.; Vasudevan, V.; and Le, Q. V. 2018. Autoaugment: Learning augmentation policies from data. arXiv.
      Google ScholarLocate open access versionFindings
    • Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and FeiFei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.
      Google ScholarFindings
    • Domhan, T.; Springenberg, J. T.; and Hutter, F. 2017. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In IJCAI.
      Google ScholarFindings
    • Elsken, T.; Metzen, J.-H.; and Hutter, F. 2017. Simple and efficient architecture search for convolutional neural networks. ICLR Workshop.
      Google ScholarFindings
    • Elsken, T.; Metzen, J. H.; and Hutter, F. 2018. Neural architecture search: A survey. arXiv.
      Google ScholarFindings
    • Fahlman, S. E., and Lebiere, C. 1990. The cascadecorrelation learning architecture. In NIPS.
      Google ScholarFindings
    • Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; and Hutter, F. 2015. Efficient and robust automated machine learning. In NIPS.
      Google ScholarFindings
    • Floreano, D.; Durr, P.; and Mattiussi, C. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence.
      Google ScholarFindings
    • Goldberg, D. E., and Deb, K. 1991. A comparative analysis of selection schemes used in genetic algorithms. FOGA.
      Google ScholarLocate open access versionFindings
    • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
      Google ScholarFindings
    • Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; and Meger, D. 2018. Deep reinforcement learning that matters. AAAI.
      Google ScholarFindings
    • Hornby, G. S. 2006. Alps: the age-layered population structure for reducing the problem of premature convergence. In GECCO.
      Google ScholarFindings
    • Hu, J.; Shen, L.; and Sun, G. 2018. Squeeze-and-excitation networks. CVPR.
      Google ScholarFindings
    • Huang, G.; Liu, Z.; Weinberger, K. Q.; and van der Maaten, L. 2017. Densely connected convolutional networks. In CVPR.
      Google ScholarFindings
    • Klein, A.; Falkner, S.; Springenberg, J. T.; and Hutter, F. 2017. Learning curve prediction with bayesian neural networks. ICLR.
      Google ScholarLocate open access versionFindings
    • Krizhevsky, A., and Hinton, G. 2009. Learning multiple layers of features from tiny images. Master’s thesis, Dept. of Computer Science, U. of Toronto.
      Google ScholarFindings
    • Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.
      Google ScholarFindings
    • Liu, C.; Zoph, B.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; and Murphy, K. 2018a. Progressive neural architecture search. ECCV.
      Google ScholarLocate open access versionFindings
    • Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; and Kavukcuoglu, K. 2018b. Hierarchical representations for efficient architecture search. In ICLR.
      Google ScholarFindings
    • Mendoza, H.; Klein, A.; Feurer, M.; Springenberg, J. T.; and Hutter, F. 2016. Towards automatically-tuned neural networks. In Workshop on Automatic Machine Learning.
      Google ScholarLocate open access versionFindings
    • Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Navruzyan, A.; Duffy, N.; and Hodjat, B. 2017. Evolving deep neural networks. arXiv.
      Google ScholarFindings
    • Miller, G. F.; Todd, P. M.; and Hegde, S. U. 1989. Designing neural networks using genetic algorithms. In ICGA.
      Google ScholarFindings
    • Negrinho, R., and Gordon, G. 2017. Deeparchitect: Automatically designing and training deep architectures. arXiv.
      Google ScholarFindings
    • Pham, H.; Guan, M. Y.; Zoph, B.; Le, Q. V.; and Dean, J. 2018. Faster discovery of neural architectures by searching for paths in a large model. ICLR Workshop.
      Google ScholarFindings
    • Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y. L.; Le, Q.; and Kurakin, A. 2017. Large-scale evolution of image classifiers. In ICML.
      Google ScholarLocate open access versionFindings
    • Salimans, T.; Ho, J.; Chen, X.; and Sutskever, I. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv.
      Google ScholarFindings
    • Saxena, S., and Verbeek, J. 2016. Convolutional neural fabrics. In NIPS.
      Google ScholarFindings
    • Simmons, J. P.; Nelson, L. D.; and Simonsohn, U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science.
      Google ScholarLocate open access versionFindings
    • Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. JMLR.
      Google ScholarLocate open access versionFindings
    • Stanley, K. O., and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evol. Comput.
      Google ScholarLocate open access versionFindings
    • Stanley, K. O.; Bryant, B. D.; and Miikkulainen, R. 2005. Real-time neuroevolution in the nero video game. TEVC.
      Google ScholarFindings
    • Suganuma, M.; Shirakawa, S.; and Nagao, T. 2017. A genetic programming approach to designing convolutional neural network architectures. In GECCO.
      Google ScholarFindings
    • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
      Google ScholarFindings
    • Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
      Google ScholarFindings
    • Wan, L.; Zeiler, M.; Zhang, S.; Le Cun, Y.; and Fergus, R. 2013. Regularization of neural networks using dropconnect. In ICML.
      Google ScholarFindings
    • Xie, L., and Yuille, A. 2017. Genetic CNN. In ICCV.
      Google ScholarLocate open access versionFindings
    • Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; and He, K. 2017. Aggregated residual transformations for deep neural networks. In CVPR.
      Google ScholarFindings
    • Yao, X. 1999. Evolving artificial neural networks. IEEE.
      Google ScholarFindings
    • Zagoruyko, S., and Komodakis, N. 2016. Wide residual networks. In BMVC.
      Google ScholarFindings
    • Zhang, X.; Li, Z.; Loy, C. C.; and Lin, D. 2017. Polynet: A pursuit of structural diversity in very deep networks. In CVPR.
      Google ScholarFindings
    • Zhong, Z.; Yan, J.; and Liu, C.-L. 2018. Practical network blocks design with q-learning. In AAAI.
      Google ScholarFindings
    • Zoph, B., and Le, Q. V. 2016. Neural architecture search with reinforcement learning. In ICLR.
      Google ScholarFindings
    • Zoph, B.; Vasudevan, V.; Shlens, J.; and Le, Q. V. 2018. Learning transferable architectures for scalable image recognition. In CVPR.
      Google ScholarFindings
    Your rating :
    0

     

    Tags
    Comments