ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search

Li Jixiang
Li Jixiang
Li Qingyuan
Li Qingyuan
Xu Ruijun
Xu Ruijun
Cited by: 8|Bibtex|Views101|Links
Keywords:
neural architecture searchscalable supernetarchitecture searchLinearly Equivalent Transformationweight sharingMore(7+)
Weibo:
We show that adding identity blocks introduces training instability

Abstract:

One-shot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weight-sharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper...More

Code:

Data:

0
Introduction
  • Neural architecture search has been recently dominated by one-shot methods (Brock et al 2018; Bender et al 2018; Stamoulis et al 2019; Guo et al 2019; Cai, Zhu, and Han 2019).
  • Evaluating the performance of models by picking a single path from the supernet becomes handy.
  • As pure reinforcement or evolutionary approaches train each model independently for evaluation, shallower models can stand out if they exhibit good performance.
  • This step is very beneficial as it achieves automatic architectural compression.
Highlights
  • Neural architecture search has been recently dominated by one-shot methods (Brock et al 2018; Bender et al 2018; Stamoulis et al 2019; Guo et al 2019; Cai, Zhu, and Han 2019)
  • NAS results can benefit from good search spaces
  • We show that adding identity blocks introduces training instability
  • We prove and demonstrate such transformation is identical in terms of representational power
  • SCARLET-B illustrates that shallow models can perform better which hits the same 76.3% as EfficientNet-B0 with much reduced FLOPs
  • Upscaled SCARLET-A2 and A4 are comparable to EfficientNet-B2, B4 separately too
Results
  • Evaluation of the Search Space

    NAS results can benefit from good search spaces. To clarify the doubts that the method works without such design, to alleviate the disturbance, the authors select two extreme models in view of FLOPs.
  • Evaluation of the Search Space.
  • NAS results can benefit from good search spaces.
  • To clarify the doubts that the method works without such design, to alleviate the disturbance, the authors select two extreme models in view of FLOPs. The evaluation results are listed in Table 2.
  • It’s a challenging task to work on such search space for ordinary search techniques
Conclusion
  • Discussion and Future

    Work

    Weight sharing is one of the most critical features for efficient neural architecture search.
  • Most of the one-shot approaches concentrate on how to find useful networks from choosing parallel choices.
  • This schema hardly meets the requirement for flexibility, it even causes conflicts inherently.
  • The authors' equivalent transformation can be regarded as a buffer for such operations.
  • How to perform flexible search efficiently remains open.In this paper, the authors unveil the overlooked scalability issue in one-shot neural architecture search approaches.
  • Upscaled SCARLET-A2 and A4 are comparable to EfficientNet-B2, B4 separately too
Summary
  • Introduction:

    Neural architecture search has been recently dominated by one-shot methods (Brock et al 2018; Bender et al 2018; Stamoulis et al 2019; Guo et al 2019; Cai, Zhu, and Han 2019).
  • Evaluating the performance of models by picking a single path from the supernet becomes handy.
  • As pure reinforcement or evolutionary approaches train each model independently for evaluation, shallower models can stand out if they exhibit good performance.
  • This step is very beneficial as it achieves automatic architectural compression.
  • Results:

    Evaluation of the Search Space

    NAS results can benefit from good search spaces. To clarify the doubts that the method works without such design, to alleviate the disturbance, the authors select two extreme models in view of FLOPs.
  • Evaluation of the Search Space.
  • NAS results can benefit from good search spaces.
  • To clarify the doubts that the method works without such design, to alleviate the disturbance, the authors select two extreme models in view of FLOPs. The evaluation results are listed in Table 2.
  • It’s a challenging task to work on such search space for ordinary search techniques
  • Conclusion:

    Discussion and Future

    Work

    Weight sharing is one of the most critical features for efficient neural architecture search.
  • Most of the one-shot approaches concentrate on how to find useful networks from choosing parallel choices.
  • This schema hardly meets the requirement for flexibility, it even causes conflicts inherently.
  • The authors' equivalent transformation can be regarded as a buffer for such operations.
  • How to perform flexible search efficiently remains open.In this paper, the authors unveil the overlooked scalability issue in one-shot neural architecture search approaches.
  • Upscaled SCARLET-A2 and A4 are comparable to EfficientNet-B2, B4 separately too
Tables
  • Table1: Each layer in our search space has 13 choices. Note index 12 is the identity block with LET
  • Table2: Full train results of models with minimal and maximal FLOPs
  • Table3: Comparison of neural models on ImageNet validation set. The input size is set to 224×224. †: Based on its published code
  • Table4: Single-crop results of scaled architectures on ImageNet validation set. ∗: Retrained w/o fixed AutoAugment. Those within parentheses are w/ fixed AutoAugment, reported by its authors
Download tables as Excel
Related work
  • One-Shot Neural Architecture Search

    In one-shot approaches, a supernet is constructed to represent the whole search space, within which each path is a stand-alone model. The supernet is trained only once, child models can inherit the weights of supernet thus it is easier and faster to evaluate its performance compared with other incomplete training techniques. Notable works are (Bender et al 2018; Stamoulis et al 2019; Guo et al 2019; Cai, Zhu, and Han 2019). Recent advances are concerned with ranking ability in the search phase (Sciuto et al 2019). FairNAS improves ranking efficiency by enforcing a strict fairness constraint. Nevertheless, the scalability of a supernet is not well investigated in these methods, which restricts its flexibility to discover potent candidate architectures.
Funding
  • Our method with linearly equivalent transformation can obtain about 20% higher than the baseline in case of the top-1 accuracy on the training set
Reference
  • Bender, G.; Kindermans, P.-J.; Zoph, B.; Vasudevan, V.; and Le, Q. 2018. Understanding and Simplifying One-Shot Architecture Search. In International Conference on Machine Learning, 549– 558.
    Google ScholarLocate open access versionFindings
  • Brock, A.; Lim, T.; Ritchie, J. M.; and Weston, N. 2018. SMASH: One-Shot Model Architecture. Search Through HyperNetworks. International Conference on Learning Representations.
    Google ScholarFindings
  • Cai, H.; Zhu, L.; and Han, S. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Chen, X.; Xie, L.; Wu, J.; and Tian, Q. 2019. Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. arXiv preprint. arXiv:1904.12760.
    Findings
  • Chen, T.; Goodfellow, I.; and Shlens, J. 201Net2Net: Accelerating Learning via Knowledge Transfer. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Chollet, F. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1251–1258.
    Google ScholarLocate open access versionFindings
  • Chu, X.; Zhang, B.; Xu, R.; and Li, J. 2019. FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. arXiv preprint. arXiv:1907.01845.
    Findings
  • Cubuk, E. D.; Zoph, B.; Mane, D.; Vasudevan, V.; and Le, Q. V. 201AutoAugment: Learning Augmentation Policies from Data. arXiv preprint. arXiv:1805.09501.
    Findings
  • Deb, K.; Pratap, A.; Agarwal, S.; and Meyarivan, T. 2002. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2):182–197.
    Google ScholarLocate open access versionFindings
  • Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255. IEEE.
    Google ScholarLocate open access versionFindings
  • Friedrich, T.; Kroeger, T.; and Neumann, F. 20Weighted Preferences in Evolutionary Multi-Objective Optimization. In Australasian Joint Conference on Artificial Intelligence, 291– 300. Springer.
    Google ScholarLocate open access versionFindings
  • Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; and Sun, J. 2019. Single Path One-Shot Neural Architecture Search with Uniform Sampling. arXiv preprint. arXiv:1904.00420.
    Findings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770– 778.
    Google ScholarLocate open access versionFindings
  • Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. 2019. Searching for MobileNetV3. arXiv preprint. arXiv:1905.02244.
    Findings
  • Hu, J.; Shen, L.; and Sun, G. 2018. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.
    Google ScholarLocate open access versionFindings
  • Huang, G.; Liu, Z.; Van Der Maaten, L.; and Weinberger, K. Q. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708.
    Google ScholarLocate open access versionFindings
  • Huang, Y.; Cheng, Y.; Chen, D.; Lee, H.; Ngiam, J.; Le, Q. V.; and Chen, Z. 2018. GPipe: Efficient Training of Giant
    Google ScholarFindings
  • Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y.; Deb, K.; Goodman, E.; and Banzhaf, W. 2019. NSGA-NET: A Multi-objective Genetic Algorithm for Neural Architecture Search. In The Genetic and Evolutionary Computation Conference.
    Google ScholarLocate open access versionFindings
  • Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; and Chen, L.-C. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510–4520.
    Google ScholarLocate open access versionFindings
  • Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal Policy Optimization Algorithms. arXiv preprint. arXiv:1707.06347.
    Findings
  • Sciuto, C.; Yu, K.; Jaggi, M.; Musat, C.; and Salzmann, M. 2019. Evaluating the Search Phase of Neural Architecture Search. arXiv preprint. arXiv:1902.08142.
    Findings
  • Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A Simple Way to Prevent Neural Networks from. Overfitting. The Journal of Machine Learning Research 15(1):1929–1958.
    Google ScholarLocate open access versionFindings
  • Stamoulis, D.; Ding, R.; Wang, D.; Lymberopoulos, D.; Priyantha, B.; Liu, J.; and Marculescu, D. 2019. Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours. arXiv preprint. arXiv:1904.02877.
    Findings
  • Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Tan, M., and Le, Q. V. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; and Le, Q. V. 2019. Mnasnet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
  • Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; and Keutzer, K. 2019. FBNet: HardwareAware Efficient ConvNet Design via Differentiable Neural Architecture Search. The IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
  • Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; and He, K. 2017. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500.
    Google ScholarLocate open access versionFindings
  • Zagoruyko, S., and Komodakis, N. 2016. Wide Residual Networks. In Proceedings of the British Machine Vision Conference.
    Google ScholarLocate open access versionFindings
  • Zhang, X.; Li, Z.; Change Loy, C.; and Lin, D. 2017. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 718–726.
    Google ScholarLocate open access versionFindings
  • Zhang, X.; Zhou, X.; Lin, M.; and Sun, J. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In The IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
  • Zoph, B.; Vasudevan, V.; Shlens, J.; and Le, Q. V. 2018. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8697–8710.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments