Large-Scale Evolution of Image Classifiers

    Esteban Real
    Esteban Real
    Sherry Moore
    Sherry Moore
    Andrew Selle
    Andrew Selle
    Yutaka Leon Suematsu
    Yutaka Leon Suematsu

    ICML, 2017.

    Cited by: 454|Bibtex|Views51|Links
    EI
    Keywords:
    human participationneuro evolutionsimple evolutionaryevolutionary algorithmfloating-point operationsMore(8+)
    Wei bo:
    In this paper we have shown that neuro-evolution is capable of constructing large, accurate networks for two challenging and popular image classification benchmarks; neuro-evolution can do this starting from trivial initial conditions while searching a very large space; the proce...

    Abstract:

    Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, w...More

    Code:

    Data:

    0
    Introduction
    Highlights
    • Neural networks can successfully perform difficult tasks where large amounts of training data are available (He et al, 2015; Weyand et al, 2016; Silver et al, 2016; Wu et al, 2016)
    • Even within the specific problem of image classification, the state of the art was attained through many years of focused investigation by hundreds of researchers (Krizhevsky et al (2012); Simonyan & Zisserman (2014); Szegedy et al (2015); He et al (2016); Huang et al (2016a), among many others)
    • The deep learning community generally perceives evolutionary algorithms to be incapable of matching the accuracies of hand-designed models (Verbancsics & Harguess, 2013; Baker et al, 2016; Zoph & Le, 2016)
    • Can a simple one-shot evolutionary process start from trivial initial conditions and yield fully trained models that rival hand-designed architectures?
    • In this paper we have shown that (i) neuro-evolution is capable of constructing large, accurate networks for two challenging and popular image classification benchmarks; neuro-evolution can do this starting from trivial initial conditions while searching a very large space; the process, once started, needs no experimenter participation; and the process yields fully trained models
    • In contrast to reinforcement learning, evolution provides a natural framework for weight inheritance: mutations can be constructed to guarantee a large degree of similarity be-
    Methods
    • The best of the pair is selected to be a parent, that is, to undergo reproduction.
    • By this the authors mean that the worker creates a copy of the parent and modifies this copy by applying a mutation, as described below.
    • After the worker creates the child, it trains this child, evaluates it on the validation set, and puts it back into the population.
    • Code and more detail about the methods described below can be found in Supplementary Section S1
    Results
    • The authors used the algorithm in Section 3 to perform several experiments.
    • Each experiment evolves a population in a few days, typified by the example in Figure 1.
    • To get a sense of the variability in outcomes, the authors repeated the experiment 5 times.
    • Not all experiments reach the same accuracy, but they get close (μ = 94.1%, σ = 0.4).
    • Fine differences in the experiment outcome may be somewhat distinguishable by validation accuracy.
    • Figure 2 shows the progress of the experiments in detail
    Conclusion
    • In this paper the authors have shown that (i) neuro-evolution is capable of constructing large, accurate networks for two challenging and popular image classification benchmarks; neuro-evolution can do this starting from trivial initial conditions while searching a very large space; the process, once started, needs no experimenter participation; and the process yields fully trained models.
    • In contrast to reinforcement learning, evolution provides a natural framework for weight inheritance: mutations can be constructed to guarantee a large degree of similarity be-.
    • Evolution would become an appealing approach to neurodiscovery for reasons beyond the scope of this paper.
    • It “hits the ground running”, improving on arbitrary initial models as soon as the experiment begins.
    • It may be possible to combine neuro-evolution with other automatic architecture discovery methods
    Summary
    • Introduction:

      Neural networks can successfully perform difficult tasks where large amounts of training data are available (He et al, 2015; Weyand et al, 2016; Silver et al, 2016; Wu et al, 2016).
    • Discovering neural network architectures, remains a laborious task.
    • It is not surprising that in recent years, techniques to automatically discover these architectures have been gaining popularity (Bergstra & Bengio, 2012; Snoek et al, 2012; Han et al, 2015; Baker et al, 2016; Zoph & Le, 2016).
    • The authors show that it is possible to evolve such competitive models today, given enough computational power
    • Methods:

      The best of the pair is selected to be a parent, that is, to undergo reproduction.
    • By this the authors mean that the worker creates a copy of the parent and modifies this copy by applying a mutation, as described below.
    • After the worker creates the child, it trains this child, evaluates it on the validation set, and puts it back into the population.
    • Code and more detail about the methods described below can be found in Supplementary Section S1
    • Results:

      The authors used the algorithm in Section 3 to perform several experiments.
    • Each experiment evolves a population in a few days, typified by the example in Figure 1.
    • To get a sense of the variability in outcomes, the authors repeated the experiment 5 times.
    • Not all experiments reach the same accuracy, but they get close (μ = 94.1%, σ = 0.4).
    • Fine differences in the experiment outcome may be somewhat distinguishable by validation accuracy.
    • Figure 2 shows the progress of the experiments in detail
    • Conclusion:

      In this paper the authors have shown that (i) neuro-evolution is capable of constructing large, accurate networks for two challenging and popular image classification benchmarks; neuro-evolution can do this starting from trivial initial conditions while searching a very large space; the process, once started, needs no experimenter participation; and the process yields fully trained models.
    • In contrast to reinforcement learning, evolution provides a natural framework for weight inheritance: mutations can be constructed to guarantee a large degree of similarity be-.
    • Evolution would become an appealing approach to neurodiscovery for reasons beyond the scope of this paper.
    • It “hits the ground running”, improving on arbitrary initial models as soon as the experiment begins.
    • It may be possible to combine neuro-evolution with other automatic architecture discovery methods
    Tables
    • Table1: Comparison with single-model hand-designed architectures. The “C10+” and “C100+” columns indicate the test accuracy on the data-augmented CIFAR-10 and CIFAR-100 datasets, respectively. The “Reachable?” column denotes whether the given handdesigned model lies within our search space. An entry of “–” indicates that no value was reported. The † indicates a result reported by
    • Table2: Comparison with automatically discovered architectures. The “C10+” and “C100+” contain the test accuracy on the dataaugmented CIFAR-10 and CIFAR-100 datasets, respectively. An entry of “–” indicates that the information was not reported or is not known to us. For Zoph & Le (2016), we quote the result with the most similar search space to ours, as well as their best result. Please refer to Table 1 for hand-designed results, including the state of the art. “Discrete params.” means that the parameters can be picked from a handful of values only (e.g. strides ∈ {1, 2, 4})
    Download tables as Excel
    Related work
    • Neuro-evolution dates back many years (Miller et al, 1989), originally being used only to evolve the weights of a fixed architecture. Stanley & Miikkulainen (2002) showed that it was advantageous to simultaneously evolve the architecture using the NEAT algorithm. NEAT has three kinds of mutations: (i) modify a weight, (ii) add a connection between existing nodes, or (iii) insert a node while splitting an existing connection. It also has a mechanism for recombining two models into one and a strategy to promote diversity known as fitness sharing (Goldberg et al, 1987). Evolutionary algorithms represent the models using an encoding that is convenient for their purpose— analogous to nature’s DNA. NEAT uses a direct encoding: every node and every connection is stored in the DNA. The alternative paradigm, indirect encoding, has been the subject of much neuro-evolution research (Gruau, 1993; Stanley et al, 2009; Pugh & Stanley, 2013; Kim & Rigazio, 2015; Fernando et al, 2016). For example, the CPPN (Stanley, 2007; Stanley et al, 2009) allows for the evolution of repeating features at different scales. Also, Kim & Rigazio (2015) use an indirect encoding to improve the convolution filters in an initially highly-optimized fixed architecture.
    Funding
    • Shows that it is possible to evolve such competitive models today, given enough computational power
    • Explores large model-architecture search spaces starting with basic initial conditions to avoid priming the system with information about known good strategies for the specific dataset at hand
    • Reports additional results using recombination, but for the most part, used mutation only
    Reference
    • Abadi, Martın, Agarwal, Ashish, Barham, Paul, Brevdo, Eugene, Chen, Zhifeng, Citro, Craig, Corrado, Greg S, Davis, Andy, Dean, Jeffrey, Devin, Matthieu, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
      Findings
    • Baker, Bowen, Gupta, Otkrist, Naik, Nikhil, and Raskar, Ramesh. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.
      Findings
    • Bergstra, James and Bengio, Yoshua. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305, 2012.
      Google ScholarLocate open access versionFindings
    • Breuel, Thomas and Shafait, Faisal. Automlp: Simple, effective, fully automated learning rate and size adjustment. In The Learning Workshop. Utah, 2010.
      Google ScholarLocate open access versionFindings
    • Fernando, Chrisantha, Banarse, Dylan, Reynolds, Malcolm, Besse, Frederic, Pfau, David, Jaderberg, Max, Lanctot, Marc, and Wierstra, Daan. Convolution by evolution: Differentiable pattern producing networks. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, pp. 109–116. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • Goldberg, David E and Deb, Kalyanmoy. A comparative analysis of selection schemes used in genetic algorithms. Foundations of genetic algorithms, 1:69–93, 1991.
      Google ScholarFindings
    • Goldberg, David E, Richardson, Jon, et al. Genetic algorithms with sharing for multimodal function optimization. In Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms, pp. 41–49.
      Google ScholarLocate open access versionFindings
    • Hillsdale, NJ: Lawrence Erlbaum, 1987.
      Google ScholarFindings
    • Goodfellow, Ian J, Warde-Farley, David, Mirza, Mehdi, Courville, Aaron C, and Bengio, Yoshua. Maxout networks. International Conference on Machine Learning, 28:1319–1327, 2013.
      Google ScholarLocate open access versionFindings
    • Gruau, Frederic. Genetic synthesis of modular neural networks. In Proceedings of the 5th International Conference on Genetic Algorithms, pp. 318–325. Morgan Kaufmann Publishers Inc., 1993.
      Google ScholarLocate open access versionFindings
    • Han, Song, Pool, Jeff, Tran, John, and Dally, William. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pp. 1135–1143, 2015.
      Google ScholarLocate open access versionFindings
    • He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
      Google ScholarLocate open access versionFindings
    • He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
      Google ScholarLocate open access versionFindings
    • Huang, Gao, Liu, Zhuang, Weinberger, Kilian Q, and van der Maaten, Laurens. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016a.
      Findings
    • Springer, 2016b.
      Google ScholarFindings
    • Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
      Findings
    • Kim, Minyoung and Rigazio, Luca. Deep clustered convolutional kernels. arXiv preprint arXiv:1503.01824, 2015.
      Findings
    • Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images. 2009.
      Google ScholarFindings
    • Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
      Google ScholarLocate open access versionFindings
    • LeCun, Yann, Cortes, Corinna, and Burges, Christopher JC. The mnist database of handwritten digits, 1998.
      Google ScholarFindings
    • Lee, Chen-Yu, Xie, Saining, Gallagher, Patrick W, Zhang, Zhengyou, and Tu, Zhuowen. Deeply-supervised nets. In AISTATS, volume 2, pp. 5, 2015.
      Google ScholarLocate open access versionFindings
    • Lin, Min, Chen, Qiang, and Yan, Shuicheng. Network in network. arXiv preprint arXiv:1312.4400, 2013.
      Findings
    • Miller, Geoffrey F, Todd, Peter M, and Hegde, Shailesh U. Designing neural networks using genetic algorithms. In Proceedings of the third international conference on Genetic algorithms, pp. 379–384. Morgan Kaufmann Publishers Inc., 1989.
      Google ScholarLocate open access versionFindings
    • Morse, Gregory and Stanley, Kenneth O. Simple evolutionary optimization can rival stochastic gradient descent in neural networks. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, pp. 477–484. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • Pugh, Justin K and Stanley, Kenneth O. Evolving multimodal controllers with hyperneat. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 735–742. ACM, 2013.
      Google ScholarLocate open access versionFindings
    • Rumelhart, David E, Hinton, Geoffrey E, and Williams, Ronald J. Learning representations by back-propagating errors. Cognitive Modeling, 5(3):1, 1988.
      Google ScholarLocate open access versionFindings
    • Saxena, Shreyas and Verbeek, Jakob. Convolutional neural fabrics. In Advances In Neural Information Processing Systems, pp. 4053–4061, 2016.
      Google ScholarLocate open access versionFindings
    • Silver, David, Huang, Aja, Maddison, Chris J, Guez, Arthur, Sifre, Laurent, Van Den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
      Google ScholarLocate open access versionFindings
    • Simmons, Joseph P, Nelson, Leif D, and Simonsohn, Uri. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11):1359– 1366, 2011.
      Google ScholarLocate open access versionFindings
    • Simonyan, Karen and Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
      Findings
    • Snoek, Jasper, Larochelle, Hugo, and Adams, Ryan P. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
      Google ScholarLocate open access versionFindings
    • Springenberg, Jost Tobias, Dosovitskiy, Alexey, Brox, Thomas, and Riedmiller, Martin. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
      Findings
    • Srivastava, Rupesh Kumar, Greff, Klaus, and Schmidhuber, Jurgen. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
      Findings
    • Stanley, Kenneth O. Compositional pattern producing networks: A novel abstraction of development. Genetic programming and evolvable machines, 8(2):131–162, 2007.
      Google ScholarLocate open access versionFindings
    • Stanley, Kenneth O and Miikkulainen, Risto. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2):99–127, 2002.
      Google ScholarLocate open access versionFindings
    • Stanley, Kenneth O, D’Ambrosio, David B, and Gauci, Jason. A hypercube-based encoding for evolving largescale neural networks. Artificial Life, 15(2):185–212, 2009.
      Google ScholarLocate open access versionFindings
    • Sutskever, Ilya, Martens, James, Dahl, George E, and Hinton, Geoffrey E. On the importance of initialization and momentum in deep learning. ICML (3), 28:1139–1147, 2013.
      Google ScholarLocate open access versionFindings
    • Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, and Rabinovich, Andrew. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.
      Google ScholarLocate open access versionFindings
    • Tuson, Andrew and Ross, Peter. Adapting operator settings in genetic algorithms. Evolutionary computation, 6(2): 161–184, 1998.
      Google ScholarLocate open access versionFindings
    • Verbancsics, Phillip and Harguess, Josh. Generative neuroevolution for deep learning. arXiv preprint arXiv:1312.5355, 2013.
      Findings
    • Weinreich, Daniel M and Chao, Lin. Rapid evolutionary escape by large populations from local fitness peaks is likely in nature. Evolution, 59(6):1175–1182, 2005.
      Google ScholarLocate open access versionFindings
    • Wu, Yonghui, Schuster, Mike, Chen, Zhifeng, Le, Quoc V., Norouzi, Mohammad, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
      Findings
    • Zagoruyko, Sergey and Komodakis, Nikos. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
      Findings
    • Zaremba, Wojciech. An empirical exploration of recurrent network architectures. 2015.
      Google ScholarFindings
    • Zoph, Barret and Le, Quoc V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
      Findings
    Your rating :
    0

     

    Tags
    Comments