Noisy Differentiable Architecture Search

arxiv, 2020.

Cited by: 0|Bibtex|Views170|Links
Keywords:
neural architecture searchdifferentiable architecture search approachskip connectionperformance collapseoptimization processMore(5+)
Weibo:
By injecting unbiased Gaussian noise into skip connections’ output, we successfully let the optimization process be perceptible about the disturbed gradient flow

Abstract:

Simplicity is the ultimate sophistication. Differentiable Architecture Search (DARTS) has now become one of the mainstream paradigms of neural architecture search. However, it largely suffers from several disturbing factors of optimization process whose results are unstable to reproduce. FairDARTS points out that skip connections native...More

Code:

Data:

Introduction
  • Performance collapse from excessive number of skip connections in the inferred model is a fatal drawback for the differentiable architecture search approaches (Liu et al [2019], Chen et al [2019a], Zela et al [2020], Chu et al [2019a]).
  • FairDARTS (Chu et al [2019a]) concludes the cause of this collapse to be the unfair advantage in an exclusively competitive environment
  • Under this perspective, they summarize several current effective approaches as different ways of avoiding the unfair advantage.
  • They summarize several current effective approaches as different ways of avoiding the unfair advantage
  • Inspired by their empirical observations, the authors adopt quite a different and straightforward approach by injecting unbiased noise to skip connections’ output.
  • The underlying philosophy is simple: the injected noise would bring perturbations into the gradient flow via skip connections, so that its unfair advantage doesn’t comfortably take effect
Highlights
  • Performance collapse from excessive number of skip connections in the inferred model is a fatal drawback for the differentiable architecture search approaches (Liu et al [2019], Chen et al [2019a], Zela et al [2020], Chu et al [2019a])
  • We propose a simple but effective approach to address the performance collapse issue in the differentiable architecture search: injecting noise into the gradient flow of skip connections
  • We proposed a novel differentiable architecture search approach, NoisyDARTS
  • By injecting unbiased Gaussian noise into skip connections’ output, we successfully let the optimization process be perceptible about the disturbed gradient flow
  • Experiments show that NoisyDARTS can work both effectively and robustly
Methods
  • To verify the validity of the method, the authors adopt two widely used search spaces: the DARTS search space Liu et al [2019] and MobileNetV2’s search space as in Cai et al [2019]
  • The former consists of a stack of duplicate normal cells and reduction cells, which are represented by a DAG of 7 nodes with each edge among intermediate nodes having 7 possible operations.
  • An example of the evolution of architectural weights of this search phase is exhibited in Figure 1
Results
  • Transferred Results on Object Detection

    The authors further evaluate the transferability of the searched models on the COCO objection task (Lin et al [2014]).
  • Transferred Results on Object Detection.
  • The authors further evaluate the transferability of the searched models on the COCO objection task (Lin et al [2014]).
  • The authors use the MMDetection tool box since it provides a good implementation for various detection algorithms (Chen et al [2019b]).
  • Following the same training setting as Lin et al [2017], all models in Table 3 are trained and evaluated on the COCO dataset for 12 epochs.
  • As shown in Table 3, the model obtains the best transferability than other models under the mobile settings
Conclusion
  • The authors proposed a novel differentiable architecture search approach, NoisyDARTS.
  • By injecting unbiased Gaussian noise into skip connections’ output, the authors successfully let the optimization process be perceptible about the disturbed gradient flow.
  • In such a way, the unfair advantage is largely attenuated.
  • NoisyDARTS-a and NoisyDARTS-b confirm that the proposed method can allow many skip connections as long as it does substantially contribute to the performance of the derived model
Summary
  • Introduction:

    Performance collapse from excessive number of skip connections in the inferred model is a fatal drawback for the differentiable architecture search approaches (Liu et al [2019], Chen et al [2019a], Zela et al [2020], Chu et al [2019a]).
  • FairDARTS (Chu et al [2019a]) concludes the cause of this collapse to be the unfair advantage in an exclusively competitive environment
  • Under this perspective, they summarize several current effective approaches as different ways of avoiding the unfair advantage.
  • They summarize several current effective approaches as different ways of avoiding the unfair advantage
  • Inspired by their empirical observations, the authors adopt quite a different and straightforward approach by injecting unbiased noise to skip connections’ output.
  • The underlying philosophy is simple: the injected noise would bring perturbations into the gradient flow via skip connections, so that its unfair advantage doesn’t comfortably take effect
  • Methods:

    To verify the validity of the method, the authors adopt two widely used search spaces: the DARTS search space Liu et al [2019] and MobileNetV2’s search space as in Cai et al [2019]
  • The former consists of a stack of duplicate normal cells and reduction cells, which are represented by a DAG of 7 nodes with each edge among intermediate nodes having 7 possible operations.
  • An example of the evolution of architectural weights of this search phase is exhibited in Figure 1
  • Results:

    Transferred Results on Object Detection

    The authors further evaluate the transferability of the searched models on the COCO objection task (Lin et al [2014]).
  • Transferred Results on Object Detection.
  • The authors further evaluate the transferability of the searched models on the COCO objection task (Lin et al [2014]).
  • The authors use the MMDetection tool box since it provides a good implementation for various detection algorithms (Chen et al [2019b]).
  • Following the same training setting as Lin et al [2017], all models in Table 3 are trained and evaluated on the COCO dataset for 12 epochs.
  • As shown in Table 3, the model obtains the best transferability than other models under the mobile settings
  • Conclusion:

    The authors proposed a novel differentiable architecture search approach, NoisyDARTS.
  • By injecting unbiased Gaussian noise into skip connections’ output, the authors successfully let the optimization process be perceptible about the disturbed gradient flow.
  • In such a way, the unfair advantage is largely attenuated.
  • NoisyDARTS-a and NoisyDARTS-b confirm that the proposed method can allow many skip connections as long as it does substantially contribute to the performance of the derived model
Tables
  • Table1: Results on CIFAR-10. †: MultAdds computed using the genotypes provided by the authors. : Averaged on training the best model for several times. GD: Gradient-based, TF: Transferred from ImageNet
  • Table2: Classification results on ImageNet. : Based on its published code. †: Searched on CIFAR-10. ††: Searched on CIFAR-100. ‡: Searched on ImageNet. : w/ SE and Swish
  • Table3: Object detection of various drop-in backbones. †: w/ SE and Swish
  • Table4: NoisyDarts can robustly escape from the performance collapse across different search spaces and datasets
  • Table5: Ablation experiments on Gaussian noise of different standard deviations
  • Table6: NoisyDARTS architecture genotypes searched on CIFAR-10
  • Table7: NosiyDARTS architecture genotypes searched on CIFAR-10 under biased noise
Download tables as Excel
Related work
  • Performance collapse in DARTS. The notorious performance collapse of DARTS is unanimously confirmed by many (Chen et al [2019a], Zela et al [2020], Chu et al [2019a]). To remedy this failure, Chen et al [2019a] carefully set a hard constraint to limit the number of skip connections. This is a strong prior since architectures within this regularized search space generally perform well, as indicated by Chu et al [2019a]. Meantime, Zela et al [2020] devised several search spaces to prove that DARTS leads to degenerate models where skip connections are dominant. To robustify the searching process, they proposed to monitor the sharpness of validation loss curvature, which has a correlation to the induced model’s performance. This however adds too much extra computation, as it needs to calculate the eigenspectrum of Hessian matrix. Chu et al [2019a] instead relaxes the search space to avoid exclusive competition. They allow each operation to have independent architecture weight by using sigmoid to mimic a multi-hot encoding other than the original softmax for a one-hot encoding. This modification enlarges the search space as it allows multiple choices between every two nodes, whereas the intrinsic collapse in the original DARTS search space still calls for a better solution.
Reference
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable Architecture Search. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. In ICCV, 2019a.
    Google ScholarLocate open access versionFindings
  • Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, and Frank Hutter. Understanding and robustifying differentiable architecture search. In ICLR, 2020. URL https://openreview.net/forum?id=H1gDNyrKDS.
    Locate open access versionFindings
  • Xiangxiang Chu, Tianbao Zhou, Bo Zhang, and Jixiang Li. Fair darts: Eliminating unfair advantages in differentiable architecture search. arXiv preprint arXiv:1911.12126, 2019a.
    Findings
  • Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008.
    Google ScholarLocate open access versionFindings
  • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, and Shane Legg. Noisy networks for exploration. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rywHCPkAW.
    Locate open access versionFindings
  • Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. Net2Net: Accelerating Learning via Knowledge Transfer. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding Gradient Noise Improves Learning for Very Deep Networks. arXiv preprint arXiv:1511.06807, 2015.
    Findings
  • Baochang Zhang, Chen Chen, Qixiang Ye, Jianzhuang Liu, David Doermann, et al. Calibrated Stochastic Gradient Descent for Convolutional Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9348–9355, 2019.
    Google ScholarLocate open access versionFindings
  • Qizhe Xie, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Self-training with Noisy Student improves ImageNet classification. arXiv preprint arXiv:1911.04252, 2019a.
    Findings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. Mnasnet: Platform-Aware Neural Architecture Search for Mobile. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. In NIPS, pages 8024–8035, 2019.
    Google ScholarLocate open access versionFindings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning Transferable Architectures for Scalable Image Recognition. In CVPR, volume 2, 2018.
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient Neural Architecture Search via Parameter Sharing. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, and Qi Tian. Multinomial Distribution Learning for Effective Neural Architecture Search. In ICCV, pages 1304–1313, 2019.
    Google ScholarLocate open access versionFindings
  • Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: Stochastic Neural Architecture Search. ICLR, 2019b.
    Google ScholarLocate open access versionFindings
  • Xuanyi Dong and Yi Yang. Searching for a Robust Neural Architecture in Four GPU Hours. In CVPR, pages 1761–1770, 2019.
    Google ScholarLocate open access versionFindings
  • Guohao Li, Guocheng Qian, Itzel C Delgadillo, Matthias Müller, Ali Thabet, and Bernard Ghanem. SGAS: Sequential Greedy Architecture Search. arXiv preprint arXiv:1912.00195, 2019.
    Findings
  • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. In ICLR, 2020. URL https://openreview.net/forum?id=BJlS634tPr.
    Locate open access versionFindings
  • Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu. Scarletnas: Bridging the gap between scalability and fairness in neural architecture search. arXiv preprint arXiv:1908.06022, 2019b.
    Findings
  • Mingxing Tan and Quoc V Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive Neural Architecture Search. In ECCV, pages 19–34, 2018.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-Excitation Networks. In CVPR, pages 7132–7141, 2018.
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. AutoAugment: Learning Augmentation Policies from Data. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In CVPR, pages 4510–4520, 2018.
    Google ScholarLocate open access versionFindings
  • Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized Evolution for Image Classifier Architecture Search. International Conference on Machine Learning, AutoML Workshop, 2018.
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. arXiv preprint. arXiv:1907.01845, 2019c.
    Findings
  • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. ICCV, 2019.
    Google ScholarFindings
  • Xiangxiang Chu, Bo Zhang, and Ruijun Xu. Moga: Searching beyond mobilenetv3. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4042–4046, 2020.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc V. Le. MixConv: Mixed Depthwise Convolutional Kernels. BMVC, 2019. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James
    Google ScholarLocate open access versionFindings
  • Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal Loss for Dense Object Detection. In ICCV, pages 2980–2988, 2017. Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019b. Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours. ECML PKDD, 2019.
    Findings
Your rating :
0

 

Tags
Comments