PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

ICLR, 2020.

Cited by: 20|Bibtex|Views134|Links
EI
Keywords:
Neural Architecture Search DARTS Regularization Normalization
Weibo:
We proposed a simple and effective approach named partially-connected differentiable architecture search

Abstract:

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-net and searching for an optimal architecture. In this paper, we present a novel approach, namely Partially-Connected DARTS, by sampling a small...More
Introduction
  • Neural architecture search (NAS) emerged as an important branch of automatic machine learning (AutoML), and has been attracting increasing attentions from both academia and industry.
  • DARTS (Liu et al, 2019) converts the operation selection into weighting a fixed set of operations
  • This makes the entire framework differentiable to architecture hyper-parameters and the network search can be efficiently accomplished in an end-to-end fashion.
  • DARTS is still subject to a large yet redundant space of network architectures and suffers from heavy memory and computation overheads
  • This prevents the search process from using larger batch sizes for either speedup or higher stability.
  • Prior work (Chen et al, 2019) proposed to reduce the search space, which leads to an approximation that may sacrifice the optimality of the discovered architecture
Highlights
  • Neural architecture search (NAS) emerged as an important branch of automatic machine learning (AutoML), and has been attracting increasing attentions from both academia and industry
  • ProxylessNAS (Cai et al, 2019), another approach that directly searched on ImageNet, used almost doubled time to produce 24.9%/7.5%, which verifies that our strategy of reducing memory consumption is more efficient yet effective
  • Compared to the ‘Lite’ versions of Single-Shot Detectors, our result enjoys significant advantages in AP, surpassing the most powerful one (SSDLiteV3) by an AP of 6.9%. All these results suggest that the advantages obtained by PC-Differentiable architecture search on image classification can transfer well to object detection, a more challenging task, and we believe these architectures would benefit even more application scenarios
  • We proposed a simple and effective approach named partially-connected differentiable architecture search (PC-Differentiable architecture search)
  • Differentiable architecture search seems to suffer even more significant instability compared to conventional neural network training, and so it can largely benefit from both (i) regularization and a larger batch size
  • Going one step further, our work reveals the redundancy of super-network optimization in Neural architecture search, and experiments reveal a gap between improving super-network optimization and finding a better architecture, and regularization plays an efficient role in shrinking the gap
Methods
  • Using 1/8, while being able to further reduce search time, causes a dramatic accuracy drop
  • These experiments not only justify the tradeoff between accuracy and efficiency of architecture search, but reveal the redundancy of super-network optimization in the context of NAS.
  • This reflects the gap between search and evaluation, i.e., a better optimized super-network does not guarantee a better searched architecture – in other words, differentiable NAS approaches are to over-fit on the super-network.
  • Channel sampling plays the role of regularization, which shrinks the gap between search and evaluation
Results
  • We randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively
  • The former one is used for training network weights and the latter for updating hyper-parameters.
  • Compared to the ‘Lite’ versions of SSD, our result enjoys significant advantages in AP, surpassing the most powerful one (SSDLiteV3) by an AP of 6.9%
  • All these results suggest that the advantages obtained by PC-DARTS on image classification can transfer well to object detection, a more challenging task, and we believe these architectures would benefit even more application scenarios
Conclusion
  • We proposed a simple and effective approach named partially-connected differentiable architecture search (PC-DARTS).
  • Going one step further, our work reveals the redundancy of super-network optimization in NAS, and experiments reveal a gap between improving super-network optimization and finding a better architecture, and regularization plays an efficient role in shrinking the gap
  • We believe these insights can inspire researchers in this field, and we will follow this path towards designing stabilized yet efficient algorithms for differentiable architecture search
Summary
  • Introduction:

    Neural architecture search (NAS) emerged as an important branch of automatic machine learning (AutoML), and has been attracting increasing attentions from both academia and industry.
  • DARTS (Liu et al, 2019) converts the operation selection into weighting a fixed set of operations
  • This makes the entire framework differentiable to architecture hyper-parameters and the network search can be efficiently accomplished in an end-to-end fashion.
  • DARTS is still subject to a large yet redundant space of network architectures and suffers from heavy memory and computation overheads
  • This prevents the search process from using larger batch sizes for either speedup or higher stability.
  • Prior work (Chen et al, 2019) proposed to reduce the search space, which leads to an approximation that may sacrifice the optimality of the discovered architecture
  • Methods:

    Using 1/8, while being able to further reduce search time, causes a dramatic accuracy drop
  • These experiments not only justify the tradeoff between accuracy and efficiency of architecture search, but reveal the redundancy of super-network optimization in the context of NAS.
  • This reflects the gap between search and evaluation, i.e., a better optimized super-network does not guarantee a better searched architecture – in other words, differentiable NAS approaches are to over-fit on the super-network.
  • Channel sampling plays the role of regularization, which shrinks the gap between search and evaluation
  • Results:

    We randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively
  • The former one is used for training network weights and the latter for updating hyper-parameters.
  • Compared to the ‘Lite’ versions of SSD, our result enjoys significant advantages in AP, surpassing the most powerful one (SSDLiteV3) by an AP of 6.9%
  • All these results suggest that the advantages obtained by PC-DARTS on image classification can transfer well to object detection, a more challenging task, and we believe these architectures would benefit even more application scenarios
  • Conclusion:

    We proposed a simple and effective approach named partially-connected differentiable architecture search (PC-DARTS).
  • Going one step further, our work reveals the redundancy of super-network optimization in NAS, and experiments reveal a gap between improving super-network optimization and finding a better architecture, and regularization plays an efficient role in shrinking the gap
  • We believe these insights can inspire researchers in this field, and we will follow this path towards designing stabilized yet efficient algorithms for differentiable architecture search
Tables
  • Table1: Comparison with state-of-the-art network architectures on CIFAR10
  • Table2: Comparison with state-of-the-art architectures on ImageNet (mobile setting)
  • Table3: Ablation study on CIFAR10 and ImageNet. PC and EN denote partial channel connections and edge normalization, respectively. All architectures on ImageNet are re-trained by 100 epochs (the 25.8% error corresponds to the best entry, 24.2%, reported in Table 2 (250 epochs)
  • Table4: Experiments on stability of DARTS and PC-DARTS. Left: Evaluations of searched architectures in five independent search runs. Middle: architectures searched with different numbers of epochs. Right: runs on architectures searched with different numbers of nodes
  • Table5: Detection results, in terms of average precisions, on the MS-COCO dataset (test-dev 2015)
Download tables as Excel
Related work
  • Thanks to the rapid development of deep learning, significant gain in performance has been brought to a wide range of computer vision problems, most of which owed to manually desgined network architectures (Krizhevsky et al, 2012; Simonyan & Zisserman, 2015; He et al, 2016; Huang et al, 2017). Recently, a new research field named neural architecture search (NAS) has been attracting increasing attentions. The goal is to find automatic ways of designing neural architectures to replace conventional handcrafted ones. According to the heuristics to explore the large architecture space, existing NAS approaches can be roughly divided into three categories, namely, evolution-based approaches, reinforcement-learning-based approaches and one-shot approaches.

    The first type of architecture search methods (Liu et al, 2018b; Xie & Yuille, 2017; Real et al, 2017; Elsken et al, 2019; Real et al, 2019; Miikkulainen et al, 2019) adopted evolutionary algorithms, which assumed the possibility of applying genetic operations to force a single architecture or a family evolve towards better performance. Among them, Liu et al (Liu et al, 2018b) introduced a hierarchical representation for describing a network architecture, and Xie et al (Xie & Yuille, 2017) decomposed each architecture into a representation of ‘genes’. Real et al (Real et al, 2019) proposed aging evolution which improved upon standard tournament selection, and surpassed the best manually designed architecture since then. Another line of heuristics turns to reinforcement learning (RL) (Zoph & Le, 2017; Baker et al, 2017; Zoph et al, 2018; Zhong et al, 2018; Liu et al, 2018a), which trained a meta-controller to guide the search process. Zoph et al (Zoph & Le, 2017) first proposed using a controller-based recurrent neural network to generate hyper-parameters of neural networks. To reduce the computation cost, researchers started to search for blocks or cells (Zhong et al, 2018; Zoph et al, 2018) instead of the entire network, and consequently, managed to reduce the overall computational costs by a factor of 7. Other kinds of approximation, such as greedy search (Liu et al, 2018a), were also applied to further accelerate search. Nevertheless, the computation costs of these approaches, based on either evolution or RL, are still beyond acceptance.
Funding
  • This work was supported in part by the National NSFC under Grant Nos. 61971285, 61425011, 61529101, 61622112, 61720106001, 61932022, and in part by the Program of Shanghai Academic Research Leader under Grant 17XD1401900
Reference
  • Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Francesco Paolo Casale, Jonathan Gordon, and Nicolo Fusi. Probabilistic neural architecture search. arXiv preprint arXiv:1902.05116, 2019.
    Findings
  • Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
    Findings
  • Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Efficient multi-objective neural architecture search via lamarckian evolution. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    Findings
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. FractalNet: Ultra-deep neural networks without residuals. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Liam Li and Ameet Talwalkar. Random search and reproducibility for neural architecture search. In UAI, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In ECCV, 2018a.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efficient architecture search. In ICLR, 2018b.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Weiwei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Jieru Mei, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Yingwei Li, Alan Yuille, and Jianchao Yang. AtomNAS: Fine-grained end-to-end neural architecture search. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. Large-scale evolution of image classifiers. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Christian Sciuto, Kaicheng Yu, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. Evaluating the search phase of neural architecture search. ArXiv, abs/1902.08142, 2019.
    Findings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. MnasNet: Platformaware neural architecture search for mobile. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Robert J. Wang, Xiang Li, Shuang Ao, and Charles X. Ling. Pelee: A real-time object detection system on mobile devices. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Lingxi Xie and Alan Yuille. Genetic CNN. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: Stochastic neural architecture search. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Published as a conference paper at ICLR 2020 Hongpeng Zhou, Minghao Yang, Jun Wang, and Wei Pan. BayesNAS: A Bayesian approach for neural architecture search. In ICML, 2019. Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. In ICLR, 2017. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments