MixPath: A Unified Approach for One-shot Neural Architecture Search

Li Xudong
Li Xudong
Lu Yi
Lu Yi
Li Jixiang
Li Jixiang
Cited by: 0|Bibtex|Views152|Links
Keywords:
differentiable architecture searchBatch Normalizationneural architecture searcharchitecture searchweight sharingMore(9+)
Weibo:
We propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space

Abstract:

The expressiveness of search space is a key concern in neural architecture search (NAS). Previous block-level approaches mainly focus on searching networks that chain one operation after another. Incorporating multi-path search space with the one-shot doctrine remains untackled. In this paper, we investigate the supernet behavior under ...More

Code:

Data:

0
Introduction
  • Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37].
  • It poses a challenge to think of its one-shot counterpart, i.e., to train a one-shot supernet that can accurately predict the performance of its multi-path submodels.
  • FairNAS [6] resolves the ranking difficulty in the single-path case with a fairness strategy, it is inherently difficult to apply the same method in the multi-path scenario.
  • The vanilla training of multi-path supernet can’t provide a confident ranking.
  • The authors dive into its real causes and undertake a unified
Highlights
  • Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37]
  • One-shot approaches [1, 6, 12, 21] make use of weight-sharing mechanism that reduces a large amount of computational cost, but these approaches mainly focus on searching for single-path networks
  • – We disclose why vanilla multi-path training could fail, for which we propose a novel yet lightweight mechanism, called shadow batch normalization, (SBN, see Fig. 1), to stabilize the supernet with a neglectable cost
  • To prove that SBN can stabilize the supernet training and improve its ranking, we score our method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes
  • We propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space
  • Our future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Methods
  • 4.1 Confirmatory Experiments on NAS-Bench-101

    To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
  • 4.1 Confirmatory Experiments on NAS-Bench-101.
  • To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
  • The outputs of selected paths are summed up to give an input to the fifth node, after which the proposed SBNs are used.
Results
  • With SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled.
  • The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Conclusion
  • The authors propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space.
  • Existing single-path approaches can be regarded as a special case of ours.
  • The proposed method uses SBNs to catch the changing features from various branch combinations, which successfully solves two difficulties of vanilla multi-path adaptation: the unstable training of supernet and the unbearable weakness of model ranking.
  • The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Summary
  • Introduction:

    Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37].
  • It poses a challenge to think of its one-shot counterpart, i.e., to train a one-shot supernet that can accurately predict the performance of its multi-path submodels.
  • FairNAS [6] resolves the ranking difficulty in the single-path case with a fairness strategy, it is inherently difficult to apply the same method in the multi-path scenario.
  • The vanilla training of multi-path supernet can’t provide a confident ranking.
  • The authors dive into its real causes and undertake a unified
  • Objectives:

    The authors' objectives are to maximize the classification accuracy while minimizing the FLOPS.
  • The authors' goal is to find the best models under 500M FLOPS
  • Methods:

    4.1 Confirmatory Experiments on NAS-Bench-101

    To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
  • 4.1 Confirmatory Experiments on NAS-Bench-101.
  • To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
  • The outputs of selected paths are summed up to give an input to the fifth node, after which the proposed SBNs are used.
  • Results:

    With SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled.
  • The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
  • Conclusion:

    The authors propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space.
  • Existing single-path approaches can be regarded as a special case of ours.
  • The proposed method uses SBNs to catch the changing features from various branch combinations, which successfully solves two difficulties of vanilla multi-path adaptation: the unstable training of supernet and the unbearable weakness of model ranking.
  • The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Tables
  • Table1: Comparison of Kendall Taus between MixPath supernets (m = 4) trained with SBNs and vanilla BNs, based on 70 sampled models from NAS-Bench-101 [<a class="ref-link" id="c44" href="#r44">44</a>]. Each control group is repeated 3 times on different seeds. †: after BN calibration
  • Table2: Comparison of architectures on CIFAR-10. †: MultAdds computed using the genotypes provided by the authors. : transfered from ImageNet
  • Table3: Comparison with state-of-the-art models on ImageNet
  • Table4: Object detection result of various drop-in backbones on the COCO dataset
  • Table5: Comparison of Kendall Taus between MixPath supernets (m = 3) trained with SBNs and vanilla BNs based on 70 sampled models from NAS-Bench-101 [<a class="ref-link" id="c44" href="#r44">44</a>]. Each control group is repeated 3 times on different seeds τ
Download tables as Excel
Related work
  • Model Ranking Correlation. The most difficult and costly procedure for neural architecture search is the evaluation of any candidate model. To this end, various proxies [37, 50, 51], explicit or implicit performance predictors [25, 27] are developed to avoid the intractable evaluation. Recent one-shot approaches [1, 6, 12] utilize a supernet where each submodel can be rapidly assessed with inherited weights. It should be emphasized that the ranking ability for this family of algorithms is of the uttermost importance [1], whose sole purpose is to evaluate networks. To quantitatively analyze their ranking ability, previous works like [6, 22, 47, 49] have applied a Kendall Tau measure [18].
Funding
  • As a result, with SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled
  • We also achieved state-of-the-art architectures like MixPath-A (76.9%) and B (77.2%) on ImageNet
  • Our future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Reference
  • Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and Simplifying One-Shot Architecture Search. In: ICML. pp. 549–558 (2018)
    Google ScholarFindings
  • Cai, H., Zhu, L., Han, S.: ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In: ICLR (2019)
    Google ScholarFindings
  • Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., et al.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
    Findings
  • Chen, X., Xie, L., Wu, J., Tian, Q.: Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. In: ICCV (2019)
    Google ScholarFindings
  • Chu, X., Zhang, B., Li, J., Li, Q., Xu, R.: Scarletnas: Bridging the gap between scalability and fairness in neural architecture search. arXiv preprint arXiv:1908.06022 (2019)
    Findings
  • Chu, X., Zhang, B., Xu, R., Li, J.: FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. arXiv preprint. arXiv:1907.01845 (2019)
    Findings
  • Chu, X., Zhou, T., Zhang, B., Li, J.: Fair darts: Eliminating unfair advantages in differentiable architecture search. arXiv preprint arXiv:1911.12126 (2019)
    Findings
  • Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: Learning Augmentation Policies from Data. CVPR (2019)
    Google ScholarLocate open access versionFindings
  • Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002)
    Google ScholarLocate open access versionFindings
  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A LargeScale Hierarchical Image Database. In: CVPR. pp. 248–255. IEEE (2009)
    Google ScholarFindings
  • Dong, X., Yang, Y.: Searching for a Robust Neural Architecture in Four GPU Hours. In: CVPR. pp. 1761–1770 (2019)
    Google ScholarFindings
  • Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J.: Single Path One-Shot Neural Architecture Search with Uniform Sampling. arXiv preprint. arXiv:1904.00420 (2019)
    Findings
  • Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: More Features from Cheap Operations. arXiv preprint arXiv:1911.11907 (2019)
    Findings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: CVPR. pp. 770–778 (2016)
    Google ScholarFindings
  • Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for MobileNetV3. ICCV (2019)
    Google ScholarFindings
  • Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q.V., Chen, Z.: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. NeurIPS (2019)
    Google ScholarLocate open access versionFindings
  • Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In: ICML. pp. 448–456 (2015), http:// proceedings.mlr.press/v37/ioffe15.html
    Locate open access versionFindings
  • Kendall, M.G.: A New Measure of Rank Correlation. Biometrika 30(1/2), 81–93 (1938)
    Google ScholarLocate open access versionFindings
  • Kornblith, S., Shlens, J., Le, Q.V.: Do Better Imagenet Models Transfer Better? In: CVPR. pp. 2661–2671 (2019)
    Google ScholarFindings
  • Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Tech. rep., Citeseer (2009)
    Google ScholarFindings
  • Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., Chang, X.: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. arXiv preprint arXiv:1911.13053 (2019)
    Findings
  • Li, G., Qian, G., Delgadillo, I.C., Muller, M., Thabet, A., Ghanem, B.: SGAS: Sequential Greedy Architecture Search. arXiv preprint arXiv:1912.00195 (2019)
    Findings
  • Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal Loss for Dense Object Detection. In: ICCV. pp. 2980–2988 (2017)
    Google ScholarFindings
  • Lin, T.Y., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollr, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: ECCV (2014)
    Google ScholarLocate open access versionFindings
  • Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive Neural Architecture Search. In: ECCV. pp. 19–34 (2018)
    Google ScholarLocate open access versionFindings
  • Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable Architecture Search. In: ICLR (2019)
    Google ScholarFindings
  • Luo, R., Tian, F., Qin, T., Chen, E., Liu, T.Y.: Neural Architecture Optimization. In: NIPS. pp. 7816–7827 (2018)
    Google ScholarLocate open access versionFindings
  • Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: Practical Guidelines for Efficient CVV Architecture Design. In: ECCV. pp. 116–131 (2018)
    Google ScholarFindings
  • Mei, J., Li, Y., Lian, X., Jin, X., Yang, L., Yuille, A., Jianchao, Y.: AtomNAS: Fine-Grained End-to-End Neural Architecture Search. ICLR (2020)
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: PyTorch: An imperative style, highperformance deep learning library. In: NIPS. pp. 8024–8035 (2019)
    Google ScholarFindings
  • Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient Neural Architecture Search via Parameter Sharing. In: ICML (2018)
    Google ScholarFindings
  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. In: CVPR. pp. 4510–4520 (2018)
    Google ScholarFindings
  • Stamoulis, D., Ding, R., Wang, D., Lymberopoulos, D., Priyantha, B., Liu, J., Marculescu, D.: Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours. ECML PKDD (2019)
    Google ScholarLocate open access versionFindings
  • Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In: AAAI (2017)
    Google ScholarFindings
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions. In: CVPR. pp. 1– 9 (2015)
    Google ScholarFindings
  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision. In: CVPR. pp. 2818–2826 (2016)
    Google ScholarFindings
  • Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: Mnasnet: Platform-Aware Neural Architecture Search for Mobile. In: CVPR (2019)
    Google ScholarFindings
  • Tan, M., Le, Q.V.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: ICML (2019)
    Google ScholarFindings
  • Tan, M., Le., Q.V.: MixConv: Mixed Depthwise Convolutional Kernels. BMVC (2019)
    Google ScholarLocate open access versionFindings
  • Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. CVPR (2019)
    Google ScholarLocate open access versionFindings
  • Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated Residual Transformations for Deep Neural Networks. In: CVPR. pp. 1492–1500 (2017)
    Google ScholarFindings
  • Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: Stochastic Neural Architecture Search. ICLR (2019)
    Google ScholarLocate open access versionFindings
  • Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G.J., Tian, Q., Xiong, H.: PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. In: ICLR (2020), https://openreview.net/forum?id=BJlS634tPr
    Findings
  • Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: Nas-bench101: Towards reproducible neural architecture search. In: ICML. pp. 7105–7114 (2019)
    Google ScholarFindings
  • Yu, J., Huang, T.: Universally Slimmable Networks and Improved Training Techniques. In: ICCV (2019)
    Google ScholarFindings
  • Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable Neural Networks. In: ICLR (2019), https://openreview.net/forum?id=H1gMCsAqY7
    Locate open access versionFindings
  • Yu, K., Sciuto, C., Jaggi, M., Musat, C., Salzmann, M.: Evaluating the search phase of neural architecture search. In: ICLR (2020), https://openreview.net/ Zela, A., Elsken, T., Saikia, T., Marrakchi, Y., Brox, T., Hutter, F.: Understanding and robustifying differentiable architecture search. In: ICLR (2020), https://openreview.net/forum?id=H1gDNyrKDS 49.
    Locate open access versionFindings
  • Learning for Effective Neural Architecture Search. In: ICCV. pp. 1304–1313 (2019)
    Google ScholarFindings
  • 50. Zoph, B., Le, Q.V.: Neural Architecture Search with Reinforcement Learning. In: ICLR (2017)
    Google ScholarFindings
  • 51. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning Transferable Architectures for Scalable Image Recognition. In: CVPR. vol. 2 (2018)
    Google ScholarFindings
Your rating :
0

 

Tags
Comments