# Designing Network Design Spaces

CVPR, pp. 10425-10433, 2020.

EI

Keywords:

deep neural networkneural architecture searchempirical distribution functionwide rangeefficient convolutional neural networkMore(9+)

Weibo:

Abstract:

In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall proce...More

Code:

Data:

Introduction

- Deep convolutional neural networks are the engine of visual recognition. Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.
- Examples include LeNet [15], AlexNet [13], VGG [26], and ResNet [8]
- This body of work advanced both the effectiveness of neural networks as well as the understanding of network design.
- The above sequence of works demonstrated the importance of convolution, network and data size, depth, and residuals, respectively.
- The outcome of these works is not just particular network instantiations, and design principles that can be generalized and applied to numerous settings

Highlights

- Deep convolutional neural networks are the engine of visual recognition
- We present a new network design paradigm that combines the advantages of manual design and neural architecture search
- We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design
- Our results suggest that designing network design spaces is a promising avenue for future research

Methods

**Design Space Design**

The authors' goal is to design better networks for visual recognition. Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models.- The authors aim to discover general design principles that can apply to and improve an entire model population
- Such design principles can provide insights into network design and are more likely to generalize to new settings.
- The core insight from [21] is that the authors can sample models from a design space, giving rise to a model distribution, and turn to tools from classical statistics to analyze the design space
- The authors note that this differs from architecture search, where the goal is to find the single best model from the space.
- The 5-stage results show the regular structure of RegNet can generalize to more stages, where AnyNetXA has even more degrees of freedom

Results

**Results are shown in Figure**

18 and Table 4.**Results are shown in Figure**.- 18 and Table 4.
- EFFICIENTNET outperforms the REGNETY.
- REGNETY outperforms EFFICIENTNET, and at higher flops both REGNETX and REGNETY perform better.
- The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs. The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs
- This leads to slow GPU training and inference times for EFFICIENTNET.
- E.g., REGNETX-8000 is 5× faster than EFFICIENTNET-B5, while having lower error

Conclusion

- The authors present a new network design paradigm. The authors' results suggest that designing network design spaces is a promising avenue for future research. error (top-1) RESNET-50 35.0±0.20 RESNEXT-50 33.5±0.10 33.2±0.20 RESNET-101 33.2±0.24 RESNEXT-101 32.1±0.30.

Summary

## Introduction:

Deep convolutional neural networks are the engine of visual recognition. Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.- Examples include LeNet [15], AlexNet [13], VGG [26], and ResNet [8]
- This body of work advanced both the effectiveness of neural networks as well as the understanding of network design.
- The above sequence of works demonstrated the importance of convolution, network and data size, depth, and residuals, respectively.
- The outcome of these works is not just particular network instantiations, and design principles that can be generalized and applied to numerous settings
## Methods:

**Design Space Design**

The authors' goal is to design better networks for visual recognition. Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models.- The authors aim to discover general design principles that can apply to and improve an entire model population
- Such design principles can provide insights into network design and are more likely to generalize to new settings.
- The core insight from [21] is that the authors can sample models from a design space, giving rise to a model distribution, and turn to tools from classical statistics to analyze the design space
- The authors note that this differs from architecture search, where the goal is to find the single best model from the space.
- The 5-stage results show the regular structure of RegNet can generalize to more stages, where AnyNetXA has even more degrees of freedom
## Results:

**Results are shown in Figure**

18 and Table 4.**Results are shown in Figure**.- 18 and Table 4.
- EFFICIENTNET outperforms the REGNETY.
- REGNETY outperforms EFFICIENTNET, and at higher flops both REGNETX and REGNETY perform better.
- The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs. The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs
- This leads to slow GPU training and inference times for EFFICIENTNET.
- E.g., REGNETX-8000 is 5× faster than EFFICIENTNET-B5, while having lower error
## Conclusion:

The authors present a new network design paradigm. The authors' results suggest that designing network design spaces is a promising avenue for future research. error (top-1) RESNET-50 35.0±0.20 RESNEXT-50 33.5±0.10 33.2±0.20 RESNET-101 33.2±0.24 RESNEXT-101 32.1±0.30.

- Table1: Design space summary. See text for details
- Table2: Mobile regime. We compare existing models using originally reported errors to RegNet models trained in a basic setup. Our simple RegNet models achieve surprisingly good results given the effort focused on this regime in the past few years
- Table3: RESNE(X)T comparisons. (a) Grouped by activations, REGNETX show considerable gains (note that for each group GPU inference and training times are similar). (b) REGNETX models outperform RESNE(X)T models under fixed flops as well
- Table4: EFFICIENTNET comparisons using our standard training schedule. Under comparable training settings, REGNETY outperforms EFFICIENTNET for most flop regimes. Moreover, REGNET models are considerably faster, e.g., REGNETX-F8000 is about 5× faster than EFFICIENTNET-B5. Note that originally reported errors for EFFICIENTNET (shown grayed out), are much lower but use longer and enhanced training schedules, see Table 7
- Table5: RESNE(X)T comparisons on ImageNetV2
- Table6: EFFICIENTNET comparisons on ImageNetV2
- Table7: Training enhancements to EFFICIENTNET-B0. Our EFFICIENTNET-B0 reproduction with DropPath [<a class="ref-link" id="c14" href="#r14">14</a>] and a 250 epoch training schedule (third row), achieves results slightly inferior to original results (bottom row), which additionally used RMSProp [<a class="ref-link" id="c30" href="#r30">30</a>], AutoAugment [<a class="ref-link" id="c2" href="#r2">2</a>], etc. Without these enhancements to the training setup results are ∼2% lower (top row), highlighting the importance of carefully controlling the training setup

Related work

- Manual network design. The introduction of AlexNet [13] catapulted network design into a thriving research area. In the following years, improved network designs were proposed; examples include VGG [26], Inception [27, 28], ResNet [8], ResNeXt [31], DenseNet [11], and MobileNet [9, 25]. The design process behind these networks was largely manual and focussed on discovering new design choices that improve accuracy e.g., the use of deeper models or residuals. We likewise share the goal of discovering new design principles. In fact, our methodology is analogous to manual design but performed at the design space level.

Funding

- Explores the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that calls RegNet
- Proposes to design network design spaces, where a design space is a parametrized set of possible model architectures
- Presents a new network design paradigm that combines the advantages of manual design and NAS
- Shows that the RegNet design space generalizes to larger compute regimes, schedule lengths, and network block types

Reference

- F. Chollet. Xception: Deep learning with depthwise separable convolutions. In CVPR, 2017. 7
- E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. AutoAugment: Learning augmentation policies from data. arXiv:1805.09501, 2018. 8, 11
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 2, 3, 8, 10
- T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552, 2017. 8
- B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. 3
- P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017. 11
- K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015. 2
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 2, 3, 9
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017. 2, 8
- J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In CVPR, 2018. 7
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017. 2
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. 4
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2
- G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In ICLR, 2017. 8, 11
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989. 1
- C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. In AISTATS, 2015. 8
- C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In ECCV, 2018. 2, 8
- H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. In ICLR, 2019. 1, 2, 8
- N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018. 8
- H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. In ICML, 2018. 2
- I. Radosavovic, J. Johnson, S. Xie, W.-Y. Lo, and P. Dollar. On network design spaces for visual recognition. In ICCV, 2019. 1, 2, 3, 4, 11
- P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. arXiv:1710.05941, 2017. 10
- E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classifier architecture search. In AAAI, 2019. 2, 8
- B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do imagenet classifiers generalize to imagenet? arXiv:1902.10811, 2019. 2, 10
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018. 2, 7, 8
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1, 2
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 2
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016. 2
- M. Tan and Q. V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. ICML, 2019. 1, 2, 7, 9, 10, 11
- T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural networks for machine learning, 2012. 11
- S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In CVPR, 2017. 2, 4, 9
- S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, 2016. 2
- X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018. 8
- B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017. 1
- B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In CVPR, 2018. 2, 8

Tags

Comments