MixPath: A Unified Approach for One-shot Neural Architecture Search
Keywords:
differentiable architecture searchBatch Normalizationneural architecture searcharchitecture searchweight sharingMore(9+)
Weibo:
Abstract:
The expressiveness of search space is a key concern in neural architecture search (NAS). Previous block-level approaches mainly focus on searching networks that chain one operation after another. Incorporating multi-path search space with the one-shot doctrine remains untackled. In this paper, we investigate the supernet behavior under ...More
Code:
Data:
Introduction
- Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37].
- It poses a challenge to think of its one-shot counterpart, i.e., to train a one-shot supernet that can accurately predict the performance of its multi-path submodels.
- FairNAS [6] resolves the ranking difficulty in the single-path case with a fairness strategy, it is inherently difficult to apply the same method in the multi-path scenario.
- The vanilla training of multi-path supernet can’t provide a confident ranking.
- The authors dive into its real causes and undertake a unified
Highlights
- Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37]
- One-shot approaches [1, 6, 12, 21] make use of weight-sharing mechanism that reduces a large amount of computational cost, but these approaches mainly focus on searching for single-path networks
- – We disclose why vanilla multi-path training could fail, for which we propose a novel yet lightweight mechanism, called shadow batch normalization, (SBN, see Fig. 1), to stabilize the supernet with a neglectable cost
- To prove that SBN can stabilize the supernet training and improve its ranking, we score our method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes
- We propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space
- Our future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Methods
- 4.1 Confirmatory Experiments on NAS-Bench-101
To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes. - 4.1 Confirmatory Experiments on NAS-Bench-101.
- To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
- The outputs of selected paths are summed up to give an input to the fifth node, after which the proposed SBNs are used.
Results
- With SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled.
- The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Conclusion
- The authors propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space.
- Existing single-path approaches can be regarded as a special case of ours.
- The proposed method uses SBNs to catch the changing features from various branch combinations, which successfully solves two difficulties of vanilla multi-path adaptation: the unstable training of supernet and the unbearable weakness of model ranking.
- The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Summary
Introduction:
Complete automation in neural network design is one of the most important research directions of automated machine learning [26,37].- It poses a challenge to think of its one-shot counterpart, i.e., to train a one-shot supernet that can accurately predict the performance of its multi-path submodels.
- FairNAS [6] resolves the ranking difficulty in the single-path case with a fairness strategy, it is inherently difficult to apply the same method in the multi-path scenario.
- The vanilla training of multi-path supernet can’t provide a confident ranking.
- The authors dive into its real causes and undertake a unified
Objectives:
The authors' objectives are to maximize the classification accuracy while minimizing the FLOPS.- The authors' goal is to find the best models under 500M FLOPS
Methods:
4.1 Confirmatory Experiments on NAS-Bench-101
To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.- 4.1 Confirmatory Experiments on NAS-Bench-101.
- To prove that SBN can stabilize the supernet training and improve its ranking, the authors score the method on a subset of a common benchmark NAS-Bench-101 [47], where the model is stacked by 9 cells, each cell has at most 5 internal nodes.
- The outputs of selected paths are summed up to give an input to the fifth node, after which the proposed SBNs are used.
Results:
With SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled.- The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Conclusion:
The authors propose a unified approach for one-shot neural architecture search, which bridges the gap between one-shot methodology and multi-path search space.- Existing single-path approaches can be regarded as a special case of ours.
- The proposed method uses SBNs to catch the changing features from various branch combinations, which successfully solves two difficulties of vanilla multi-path adaptation: the unstable training of supernet and the unbearable weakness of model ranking.
- The authors' future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Tables
- Table1: Comparison of Kendall Taus between MixPath supernets (m = 4) trained with SBNs and vanilla BNs, based on 70 sampled models from NAS-Bench-101 [<a class="ref-link" id="c44" href="#r44">44</a>]. Each control group is repeated 3 times on different seeds. †: after BN calibration
- Table2: Comparison of architectures on CIFAR-10. †: MultAdds computed using the genotypes provided by the authors. : transfered from ImageNet
- Table3: Comparison with state-of-the-art models on ImageNet
- Table4: Object detection result of various drop-in backbones on the COCO dataset
- Table5: Comparison of Kendall Taus between MixPath supernets (m = 3) trained with SBNs and vanilla BNs based on 70 sampled models from NAS-Bench-101 [<a class="ref-link" id="c44" href="#r44">44</a>]. Each control group is repeated 3 times on different seeds τ
Related work
- Model Ranking Correlation. The most difficult and costly procedure for neural architecture search is the evaluation of any candidate model. To this end, various proxies [37, 50, 51], explicit or implicit performance predictors [25, 27] are developed to avoid the intractable evaluation. Recent one-shot approaches [1, 6, 12] utilize a supernet where each submodel can be rapidly assessed with inherited weights. It should be emphasized that the ranking ability for this family of algorithms is of the uttermost importance [1], whose sole purpose is to evaluate networks. To quantitatively analyze their ranking ability, previous works like [6, 22, 47, 49] have applied a Kendall Tau measure [18].
Funding
- As a result, with SBN enabled, the best model found obtains 97.35% top-1 accuracy, higher than the best 97.12% when SBN is disabled
- We also achieved state-of-the-art architectures like MixPath-A (76.9%) and B (77.2%) on ImageNet
- Our future work regards how to further improve the evaluation performance of the supernet, and to provide a deeper theoretical analysis between the weight-sharing mechanism and ranking ability
Reference
- Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and Simplifying One-Shot Architecture Search. In: ICML. pp. 549–558 (2018)
- Cai, H., Zhu, L., Han, S.: ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In: ICLR (2019)
- Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., et al.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
- Chen, X., Xie, L., Wu, J., Tian, Q.: Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. In: ICCV (2019)
- Chu, X., Zhang, B., Li, J., Li, Q., Xu, R.: Scarletnas: Bridging the gap between scalability and fairness in neural architecture search. arXiv preprint arXiv:1908.06022 (2019)
- Chu, X., Zhang, B., Xu, R., Li, J.: FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. arXiv preprint. arXiv:1907.01845 (2019)
- Chu, X., Zhou, T., Zhang, B., Li, J.: Fair darts: Eliminating unfair advantages in differentiable architecture search. arXiv preprint arXiv:1911.12126 (2019)
- Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: Learning Augmentation Policies from Data. CVPR (2019)
- Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002)
- Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A LargeScale Hierarchical Image Database. In: CVPR. pp. 248–255. IEEE (2009)
- Dong, X., Yang, Y.: Searching for a Robust Neural Architecture in Four GPU Hours. In: CVPR. pp. 1761–1770 (2019)
- Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J.: Single Path One-Shot Neural Architecture Search with Uniform Sampling. arXiv preprint. arXiv:1904.00420 (2019)
- Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: More Features from Cheap Operations. arXiv preprint arXiv:1911.11907 (2019)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: CVPR. pp. 770–778 (2016)
- Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for MobileNetV3. ICCV (2019)
- Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q.V., Chen, Z.: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. NeurIPS (2019)
- Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In: ICML. pp. 448–456 (2015), http:// proceedings.mlr.press/v37/ioffe15.html
- Kendall, M.G.: A New Measure of Rank Correlation. Biometrika 30(1/2), 81–93 (1938)
- Kornblith, S., Shlens, J., Le, Q.V.: Do Better Imagenet Models Transfer Better? In: CVPR. pp. 2661–2671 (2019)
- Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Tech. rep., Citeseer (2009)
- Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., Chang, X.: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. arXiv preprint arXiv:1911.13053 (2019)
- Li, G., Qian, G., Delgadillo, I.C., Muller, M., Thabet, A., Ghanem, B.: SGAS: Sequential Greedy Architecture Search. arXiv preprint arXiv:1912.00195 (2019)
- Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal Loss for Dense Object Detection. In: ICCV. pp. 2980–2988 (2017)
- Lin, T.Y., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollr, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: ECCV (2014)
- Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive Neural Architecture Search. In: ECCV. pp. 19–34 (2018)
- Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable Architecture Search. In: ICLR (2019)
- Luo, R., Tian, F., Qin, T., Chen, E., Liu, T.Y.: Neural Architecture Optimization. In: NIPS. pp. 7816–7827 (2018)
- Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: Practical Guidelines for Efficient CVV Architecture Design. In: ECCV. pp. 116–131 (2018)
- Mei, J., Li, Y., Lian, X., Jin, X., Yang, L., Yuille, A., Jianchao, Y.: AtomNAS: Fine-Grained End-to-End Neural Architecture Search. ICLR (2020)
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: PyTorch: An imperative style, highperformance deep learning library. In: NIPS. pp. 8024–8035 (2019)
- Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient Neural Architecture Search via Parameter Sharing. In: ICML (2018)
- Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. In: CVPR. pp. 4510–4520 (2018)
- Stamoulis, D., Ding, R., Wang, D., Lymberopoulos, D., Priyantha, B., Liu, J., Marculescu, D.: Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours. ECML PKDD (2019)
- Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In: AAAI (2017)
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions. In: CVPR. pp. 1– 9 (2015)
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision. In: CVPR. pp. 2818–2826 (2016)
- Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: Mnasnet: Platform-Aware Neural Architecture Search for Mobile. In: CVPR (2019)
- Tan, M., Le, Q.V.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: ICML (2019)
- Tan, M., Le., Q.V.: MixConv: Mixed Depthwise Convolutional Kernels. BMVC (2019)
- Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. CVPR (2019)
- Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated Residual Transformations for Deep Neural Networks. In: CVPR. pp. 1492–1500 (2017)
- Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: Stochastic Neural Architecture Search. ICLR (2019)
- Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G.J., Tian, Q., Xiong, H.: PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. In: ICLR (2020), https://openreview.net/forum?id=BJlS634tPr
- Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: Nas-bench101: Towards reproducible neural architecture search. In: ICML. pp. 7105–7114 (2019)
- Yu, J., Huang, T.: Universally Slimmable Networks and Improved Training Techniques. In: ICCV (2019)
- Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable Neural Networks. In: ICLR (2019), https://openreview.net/forum?id=H1gMCsAqY7
- Yu, K., Sciuto, C., Jaggi, M., Musat, C., Salzmann, M.: Evaluating the search phase of neural architecture search. In: ICLR (2020), https://openreview.net/ Zela, A., Elsken, T., Saikia, T., Marrakchi, Y., Brox, T., Hutter, F.: Understanding and robustifying differentiable architecture search. In: ICLR (2020), https://openreview.net/forum?id=H1gDNyrKDS 49.
- Learning for Effective Neural Architecture Search. In: ICCV. pp. 1304–1313 (2019)
- 50. Zoph, B., Le, Q.V.: Neural Architecture Search with Reinforcement Learning. In: ICLR (2017)
- 51. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning Transferable Architectures for Scalable Image Recognition. In: CVPR. vol. 2 (2018)
Tags
Comments