FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search

Xu Ruijun
Xu Ruijun
Li Jixiang
Li Jixiang
Cited by: 0|Bibtex|Views119|Links
Keywords:
neural architecture searchreal strengthstrict fairness constraintarchitecture searchweight sharingMore(7+)
Weibo:
We have thoroughly investigated the previously undiscussed fairness problem in weight-sharing neural architecture search approaches

Abstract:

The ability to rank models by its real strength is the key to Neural Architecture Search. Traditional approaches adopt an incomplete training for such purpose which is still very costly. One-shot methods are thus devised to cut the expense by reusing the same set of weights. However, it is uncertain whether shared weights are truly effe...More

Code:

Data:

0
Introduction
  • The advent of neural architecture search (NAS) has brought deep learning into an era of automation [29].
  • As the conventional NAS approaches evaluate an enormous amount of models based on resourcedevouring training, recent attention is drawn to improve the estimation efficiency via parameter sharing [3, 11, 15, 27].
  • The target architecture is induced from the set of operations based on their mixing probabilities.
  • All possible models are optimized with the same set of weights, remarkably cutting down the training cost
Highlights
  • The advent of neural architecture search (NAS) has brought deep learning into an era of automation [29]
  • As the conventional NAS approaches evaluate an enormous amount of models based on resourcedevouring training, recent attention is drawn to improve the estimation efficiency via parameter sharing [3, 11, 15, 27]
  • For instance in differential architecture search [11], a super network is built with categorically parameterized operations, whose output is a mixture of all operations
  • We present Fair Neural Architecture Search, which clears up two fundamental unanswered problems
  • Here we only examine NAS methods tailored for multi-objective optimization
  • We have thoroughly investigated the previously undiscussed fairness problem in weight-sharing neural architecture search approaches
Methods
  • Type

    Memory Consumption (Supernet)

    Supernet Train Cost Search Cost EF SF (GPU days)

    ENAS [15] DARTS [11] One-Shot [2] FBNet [27] ProxylessNAS [4] Single Path One-Shot [7] Single-Path NAS [22] FairNAS

    RL + Fine Tune Gradient-based

    Supernet Gradient-based Gradient-based/RL Supernet+EA

    Supernet Fair Supernet+EA+RL

    Single path A whole supernet A whole supernet A whole supernet Two paths Single path Single path with super kernels Single path n/a 4† 16 ‡ 12 1.25‡ n/a <1.
  • The authors' approach has proved to be effective in that it generated a set of new state-of-the-art architectures on the ImageNet dataset, with FairNAS-A achieving 75.34% top-1 validation accuracy at a size comparable to other NAS generated models.
  • The authors need to address transferability to larger search space, as well as various datasets.
  • Another open question needs to be perfectly resolved is its scalability.
  • One-shot approaches generally suffer from fixed-depth search space, which poses a size limit for child networks
Results
  • The strategy that adheres to the strict fairness requirements boosts the accuracy of supernet accuracy steadily and rapidly, which reaches up to 60% after 40 epochs.
Conclusion
  • The authors have thoroughly investigated the previously undiscussed fairness problem in weight-sharing neural architecture search approaches.
  • The authors have discovered that biased weight-sharing methods like [2, 4, 7] either underestimate or overestimate the performance of the chosen model.
  • For this reason, the authors have enforced a strict fairness constraint that helps to equitably train each possible operations.
  • With a stable ranking at hand, the authors exploited the power of a multi-objective reinforced evolutionary method and innovatively utilized the fairly trained supernet as its
Summary
  • Introduction:

    The advent of neural architecture search (NAS) has brought deep learning into an era of automation [29].
  • As the conventional NAS approaches evaluate an enormous amount of models based on resourcedevouring training, recent attention is drawn to improve the estimation efficiency via parameter sharing [3, 11, 15, 27].
  • The target architecture is induced from the set of operations based on their mixing probabilities.
  • All possible models are optimized with the same set of weights, remarkably cutting down the training cost
  • Methods:

    Type

    Memory Consumption (Supernet)

    Supernet Train Cost Search Cost EF SF (GPU days)

    ENAS [15] DARTS [11] One-Shot [2] FBNet [27] ProxylessNAS [4] Single Path One-Shot [7] Single-Path NAS [22] FairNAS

    RL + Fine Tune Gradient-based

    Supernet Gradient-based Gradient-based/RL Supernet+EA

    Supernet Fair Supernet+EA+RL

    Single path A whole supernet A whole supernet A whole supernet Two paths Single path Single path with super kernels Single path n/a 4† 16 ‡ 12 1.25‡ n/a <1.
  • The authors' approach has proved to be effective in that it generated a set of new state-of-the-art architectures on the ImageNet dataset, with FairNAS-A achieving 75.34% top-1 validation accuracy at a size comparable to other NAS generated models.
  • The authors need to address transferability to larger search space, as well as various datasets.
  • Another open question needs to be perfectly resolved is its scalability.
  • One-shot approaches generally suffer from fixed-depth search space, which poses a size limit for child networks
  • Results:

    The strategy that adheres to the strict fairness requirements boosts the accuracy of supernet accuracy steadily and rapidly, which reaches up to 60% after 40 epochs.
  • Conclusion:

    The authors have thoroughly investigated the previously undiscussed fairness problem in weight-sharing neural architecture search approaches.
  • The authors have discovered that biased weight-sharing methods like [2, 4, 7] either underestimate or overestimate the performance of the chosen model.
  • For this reason, the authors have enforced a strict fairness constraint that helps to equitably train each possible operations.
  • With a stable ranking at hand, the authors exploited the power of a multi-objective reinforced evolutionary method and innovatively utilized the fairly trained supernet as its
Tables
  • Table1: Comparison of state-of-the-art NAS methods as per fairness basis. EF: Expectation Fairness, SF: Strict Fairness, †: searched on CIFAR-10, ‡: TPU, : reported by [<a class="ref-link" id="c7" href="#r7">7</a>]
  • Table2: Comparison of mobile models on ImageNet. The input size is set to 224×224. †: Based on its published code, note the latencies within the parentheses are reported by its authors. Mobile latencies are measured on a Google Pixel 1 using a single large core of CPU with a batch size of 1 (via Tensorflow Lite shipped with Tensorflow v1.14.0-rc0).‡: Time measured by its author on a Samsung Galaxy S8
  • Table3: Hyperparameters for the whole pipeline
Download tables as Excel
Related work
  • The most time-consuming part of common neural architecture search is that in order to rank thousands of sampled architectures, they have to be trained from scratch for certain epochs. Reasoning on whether it is feasible to train a single set of weights for many heterogeneous architectures has led to a new paradigm called one-shot model architecture search. The key to these approaches lies in that the performance of candidate models can be highly predictable.

    SMASH [3] devised a stand-alone hypernetwork that generates weights for all possible architectures in the search space. It is trained to reduce the loss of each sampled model on some mini-batch of dataset. A single train of the hypernetwork is good once and for all, any candidate network can then be evaluated directly. However, the design of a hypernetwork requires delicate expertise to obtain a strong correlation between the true performance of a sampled model and that with generated weights.
Funding
  • The strategy that adheres to the strict fairness requirements boosts the accuracy of supernet accuracy steadily and rapidly, which reaches up to 60% after 40 epochs
Study subjects and analysis
samples: 50
Dataset. We perform all experiments on ImageNet [17] and randomly select 50,000 images from the training set as our validation set (50 samples from each class). The remaining training set is used as our training set, while the original validation set is taken as the test set to measure the final performance of each model

Reference
  • George Adam and Jonathan Lorraine. neural architecture search techniques. arXiv:1904.00438, 2019. 2
    Findings
  • Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018. 1, 2, 5, 6, 8, 9
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. Smash: one-shot model architecture search through hypernetworks. In International Conference on Learning Representations, 2018. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations, 2019. 2, 7, 8, 9
    Google ScholarLocate open access versionFindings
  • Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Hailong Ma. Multi-objective reinforced evolution in mobile neural architecture search. arXiv preprint arXiv:1901.01074, 2019. 2, 5, 6, 7
    Findings
  • Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. 7
    Findings
  • Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. Single path oneshot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420, 2019. 1, 2, 5, 6, 8, 9
    Findings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2017
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015. 8
    Google ScholarLocate open access versionFindings
  • Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. 6
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. International Conference on Learning Representations, 2019. 1, 2, 9
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. International Conference on Learning Representations, 2017. 7
    Google ScholarLocate open access versionFindings
  • Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf. Nsga-net: A multi-objective genetic algorithm for neural architecture search. arXiv preprint arXiv:1810.03522, 2018. 1
    Findings
  • Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems, pages 7816–7827, 2018. 6
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, 2018. 1, 2, 6, 9
    Google ScholarLocate open access versionFindings
  • Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. International Conference on Machine Learning, AutoML Workshop, 2018. 1, 7
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 7
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 207, 9
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 6
    Findings
  • Christian Sciuto, Kaicheng Yu, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. Evaluating the search phase of neural architecture search. arXiv preprint arXiv:1902.08142, 2019. 6
    Findings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014. 7
    Google ScholarLocate open access versionFindings
  • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877, 2019. 2, 9
    Findings
  • Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning, pages 1139–1147, 2013. 7
    Google ScholarLocate open access versionFindings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 1, 7, 9
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc V Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 2019. 1
    Google ScholarLocate open access versionFindings
  • Ian Tweddle. James Stirlings methodus differentialis: an annotated translation of Stirlings text. Springer Science & Business Media, 2012. 4, 10
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiableneural architecture search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 1, 2, 9
    Google ScholarLocate open access versionFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018. 7
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. International Conference on Learning Representations, 2017. 1
    Google ScholarLocate open access versionFindings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2(6), 2018. 1, 7, 9
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments