EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

    International Conference on Machine Learning, pp. 6105-6114, 2019.

    Cited by: 268|Bibtex|Views183|Links
    EI
    Keywords:
    ICMLeffective compoundgood accuracycompound scaling methodneural architecture searchMore(7+)
    Wei bo:
    We propose a simple and highly effective compound scaling method, which enables us to scale up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency

    Abstract:

    Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this ob...More

    Code:

    Data:

    Introduction
    • Scaling up ConvNets is widely used to achieve better accuracy.
    • ResNet (He et al, 2016) can be scaled up from ResNet-18 to ResNet-200 by using more layers; Recently, GPipe (Huang et al, 2018) achieved 84.3% ImageNet top-1 accuracy by scaling up a baseline model four time larger.
    • The process of scaling up ConvNets has never been well understood and there are currently many EfficientNet-B7 B6 B5 AmoebaNet-A B4 NASNet-A.
    • Imagenet Top 1 Accuracy (%) B3 80
    Highlights
    • Scaling up ConvNets is widely used to achieve better accuracy
    • We propose a simple yet effective compound scaling method
    • We propose a new compound scaling method, which use a compound coefficient φ to uniformly scales network width, depth, and resolution in a principled way: depth: d = αφ width: w = βφ resolution: r = γφ s.t. α · β2 · γ2 ≈ 2 α ≥ 1, β ≥ 1, γ ≥ 1 where α, β, γ are constants that can be determined by a small grid search
    • We will evaluate our scaling method using existing ConvNets, but in order to better demonstrate the effectiveness of our scaling method, we have developed a new mobile-size baseline, called EfficientNet
    • We propose a simple and highly effective compound scaling method, which enables us to scale up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency
    • Powered by this compound scaling method, we demonstrate that a mobilesize EfficientNet model can be scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on both ImageNet and five commonly used transfer learning datasets
    Methods
    • The authors will first evaluate the scaling method on existing ConvNets and the new proposed EfficientNets.

      5.1.
    • The authors will first evaluate the scaling method on existing ConvNets and the new proposed EfficientNets.
    • As a proof of concept, the authors first apply the scaling method to the widely-used MobileNets (Howard et al, 2017; Sandler et al, 2018) and ResNet (He et al, 2016).
    • Compared to other single-dimension scaling methods, the compound scaling method improves the accuracy on all these models, suggesting the effectiveness of the proposed scaling method for general existing ConvNets.
    Results
    • The authors train the EfficientNet models on ImageNet using similar settings as (Tan et al, 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs.
    • #Param Model Acc.
    • NASNet-A 98.0% 85M EfficientNet-B0 98.1% 4M (21x) †Gpipe 99.0% 556M EfficientNet-B7 98.9% 64M (8.7x) CIFAR-100.
    • NASNet-A 87.5% 85M EfficientNet-B0 88.1% 4M (21x) Gpipe 91.3% 556M EfficientNet-B7 91.7% 64M (8.7x) Birdsnap.
    • Transfer Learning Results for EfficientNet.
    • The authors have evaluated the EfficientNet on a list of commonly used transfer learning datasets, as shown in Table 6.
    • The authors borrow the same training settings from (Kornblith et al, 2019) and (Huang et al, 2018), which take ImageNet pretrained checkpoints and finetune on new datasets.
    Conclusion
    • To disentangle the contribution of the proposed scaling method from the EfficientNet architecture, Figure 8 compares the ImageNet performance of different scaling methods for the same EfficientNet-B0 baseline network.
    • As shown in the figure, the model with compound scaling tends to focus on more relevant regions with more object details, while other models are either lack of object details or unable to capture all objects in the images.In this paper, the authors systematically study ConvNet scaling and identify that carefully balancing network width, depth, and resolution is an important but missing piece, preventing them from better accuracy and efficiency
    • To address this issue, the authors propose a simple and highly effective compound scaling method, which enables them to scale up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency.
    • Powered by this compound scaling method, the authors demonstrate that a mobilesize EfficientNet model can be scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on both ImageNet and five commonly used transfer learning datasets
    Summary
    • Introduction:

      Scaling up ConvNets is widely used to achieve better accuracy.
    • ResNet (He et al, 2016) can be scaled up from ResNet-18 to ResNet-200 by using more layers; Recently, GPipe (Huang et al, 2018) achieved 84.3% ImageNet top-1 accuracy by scaling up a baseline model four time larger.
    • The process of scaling up ConvNets has never been well understood and there are currently many EfficientNet-B7 B6 B5 AmoebaNet-A B4 NASNet-A.
    • Imagenet Top 1 Accuracy (%) B3 80
    • Methods:

      The authors will first evaluate the scaling method on existing ConvNets and the new proposed EfficientNets.

      5.1.
    • The authors will first evaluate the scaling method on existing ConvNets and the new proposed EfficientNets.
    • As a proof of concept, the authors first apply the scaling method to the widely-used MobileNets (Howard et al, 2017; Sandler et al, 2018) and ResNet (He et al, 2016).
    • Compared to other single-dimension scaling methods, the compound scaling method improves the accuracy on all these models, suggesting the effectiveness of the proposed scaling method for general existing ConvNets.
    • Results:

      The authors train the EfficientNet models on ImageNet using similar settings as (Tan et al, 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs.
    • #Param Model Acc.
    • NASNet-A 98.0% 85M EfficientNet-B0 98.1% 4M (21x) †Gpipe 99.0% 556M EfficientNet-B7 98.9% 64M (8.7x) CIFAR-100.
    • NASNet-A 87.5% 85M EfficientNet-B0 88.1% 4M (21x) Gpipe 91.3% 556M EfficientNet-B7 91.7% 64M (8.7x) Birdsnap.
    • Transfer Learning Results for EfficientNet.
    • The authors have evaluated the EfficientNet on a list of commonly used transfer learning datasets, as shown in Table 6.
    • The authors borrow the same training settings from (Kornblith et al, 2019) and (Huang et al, 2018), which take ImageNet pretrained checkpoints and finetune on new datasets.
    • Conclusion:

      To disentangle the contribution of the proposed scaling method from the EfficientNet architecture, Figure 8 compares the ImageNet performance of different scaling methods for the same EfficientNet-B0 baseline network.
    • As shown in the figure, the model with compound scaling tends to focus on more relevant regions with more object details, while other models are either lack of object details or unable to capture all objects in the images.In this paper, the authors systematically study ConvNet scaling and identify that carefully balancing network width, depth, and resolution is an important but missing piece, preventing them from better accuracy and efficiency
    • To address this issue, the authors propose a simple and highly effective compound scaling method, which enables them to scale up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency.
    • Powered by this compound scaling method, the authors demonstrate that a mobilesize EfficientNet model can be scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on both ImageNet and five commonly used transfer learning datasets
    Tables
    • Table1: EfficientNet-B0 baseline network – Each row describes a stage i with Li layers, with input resolution Hi, Wi and output channels Ci. Notations are adopted from equation 2
    • Table2: EfficientNet Performance Results on ImageNet (<a class="ref-link" id="cRussakovsky_et+al_2015_a" href="#rRussakovsky_et+al_2015_a">Russakovsky et al, 2015</a>). All EfficientNet models are scaled from our baseline EfficientNet-B0 using different compound coefficient φ in Equation 3. ConvNets with similar top-1/top-5 accuracy are grouped together for efficiency comparison. Our scaled EfficientNet models consistently reduce parameters and FLOPS by an order of magnitude (up to 8.4x parameter reduction and up to 16x FLOPS reduction) than existing ConvNets
    • Table3: Scaling Up MobileNets and ResNet
    • Table4: Inference Latency Comparison – Latency is measured with batch size 1 on a single core of Intel Xeon CPU E5-2690
    • Table5: EfficientNet Performance Results on Transfer Learning Datasets. Our scaled EfficientNet models achieve new state-of-theart accuracy for 5 out of 8 datasets, with 9.6x fewer parameters on average
    • Table6: Transfer Learning Datasets
    Download tables as Excel
    Related work
    • ConvNet Accuracy: Since AlexNet (Krizhevsky et al, 2012) won the 2012 ImageNet competition, ConvNets have become increasingly more accurate by going bigger: while the 2014 ImageNet winner GoogleNet (Szegedy et al, 2015) achieves 74.8% top-1 accuracy with about 6.8M parameters, the 2017 ImageNet winner SENet (Hu et al, 2018) achieves 82.7% top-1 accuracy with 145M parameters. Recently, GPipe (Huang et al, 2018) further pushes the state-of-the-art ImageNet top-1 validation accuracy to 84.3% using 557M parameters: it is so big that it can only be trained with a specialized pipeline parallelism library by partitioning the network and spreading each part to a different accelerator. While these models are mainly designed for ImageNet, recent studies have shown better ImageNet models also perform better across a variety of transfer learning datasets (Kornblith et al, 2019), and other computer vision tasks such as object detection (He et al, 2016; Tan et al, 2019). Although higher accuracy is critical for many applications, we have already hit the hardware memory limit, and thus further accuracy gain needs better efficiency.

      ConvNet Efficiency: Deep ConvNets are often overparameterized. Model compression (Han et al, 2016; He et al, 2018; Yang et al, 2018) is a common way to reduce model size by trading accuracy for efficiency. As mobile phones become ubiquitous, it is also common to handcraft efficient mobile-size ConvNets, such as SqueezeNets (Iandola et al, 2016; Gholami et al, 2018), MobileNets (Howard et al, 2017; Sandler et al, 2018), and ShuffleNets (Zhang et al, 2018; Ma et al, 2018). Recently, neural architecture search becomes increasingly popular in designing efficient mobile-size ConvNets (Tan et al, 2019; Cai et al., 2019), and achieves even better efficiency than hand-crafted mobile ConvNets by extensively tuning the network width, depth, convolution kernel types and sizes. However, it is unclear how to apply these techniques for larger models that have much larger design space and much more expensive tuning cost. In this paper, we aim to study model efficiency for super large ConvNets that surpass state-of-the-art accuracy. To achieve this goal, we resort to model scaling.
    Reference
    • Berg, T., Liu, J., Woo Lee, S., Alexander, M. L., Jacobs, D. W., and Belhumeur, P. N. Birdsnap: Large-scale fine-grained visual categorization of birds. CVPR, pp. 2011–2018, 2014.
      Google ScholarLocate open access versionFindings
    • Bossard, L., Guillaumin, M., and Van Gool, L. Food-101– mining discriminative components with random forests. ECCV, pp. 446–461, 2014.
      Google ScholarLocate open access versionFindings
    • Cai, H., Zhu, L., and Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. ICLR, 2019.
      Google ScholarLocate open access versionFindings
    • Chollet, F. Xception: Deep learning with depthwise separable convolutions. CVPR, pp. 1610–02357, 2017.
      Google ScholarLocate open access versionFindings
    • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. Autoaugment: Learning augmentation policies from data. CVPR, 2019.
      Google ScholarLocate open access versionFindings
    • Elfwing, S., Uchibe, E., and Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 107:3–11, 2018.
      Google ScholarLocate open access versionFindings
    • Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. Squeezenext: Hardware-aware neural network design. ECV Workshop at CVPR’18, 2018.
      Google ScholarLocate open access versionFindings
    • Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016.
      Google ScholarLocate open access versionFindings
    • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. CVPR, pp. 770–778, 2016.
      Google ScholarLocate open access versionFindings
    • Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • Huang, G., Sun, Y., Liu, Z., Sedra, D., and Weinberger, K. Q. Deep networks with stochastic depth. ECCV, pp. 646–661, 2016.
      Google ScholarLocate open access versionFindings
    • Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q. V., and Chen, Z. Gpipe: Efficient training of giant neural networks using pipeline parallelism. arXiv preprint arXiv:1808.07233, 2018.
      Findings
    • Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
      Findings
    • Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, pp. 448–456, 2015.
      Google ScholarLocate open access versionFindings
    • Kornblith, S., Shlens, J., and Le, Q. V. Do better imagenet models transfer better? CVPR, 2019.
      Google ScholarLocate open access versionFindings
    • Krause, J., Deng, J., Stark, M., and Fei-Fei, L. Collecting a large-scale dataset of fine-grained cars. Second Workshop on Fine-Grained Visual Categorizatio, 2013.
      Google ScholarFindings
    • Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical Report, 2009.
      Google ScholarFindings
    • Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In NIPS, pp. 1097–1105, 2012.
      Google ScholarLocate open access versionFindings
    • Lin, H. and Jegelka, S. Resnet with one-neuron hidden layers is a universal approximator. NeurIPS, pp. 6172– 6181, 2018.
      Google ScholarLocate open access versionFindings
    • Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. Feature pyramid networks for object detection. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • He, K., Gkioxari, G., Dollar, P., and Girshick, R. Mask r-cnn. ICCV, pp. 2980–2988, 2017.
      Google ScholarFindings
    • He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., and Han, S. Amc: Automl for model compression and acceleration on mobile devices. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
      Findings
    • Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., and Murphy, K. Progressive neural architecture search. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. The expressive power of neural networks: A view from the width. NeurIPS, 2018.
      Google ScholarLocate open access versionFindings
    • Ma, N., Zhang, X., Zheng, H.-T., and Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and van der Maaten, L. Exploring the limits of weakly supervised pretraining. arXiv preprint arXiv:1805.00932, 2018.
      Findings
    • Maji, S., Rahtu, E., Kannala, J., Blaschko, M., and Vedaldi, A. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
      Findings
    • Ngiam, J., Peng, D., Vasudevan, V., Kornblith, S., Le, Q. V., and Pang, R. Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056, 2018.
      Findings
    • Nilsback, M.-E. and Zisserman, A. Automated flower classification over a large number of classes. ICVGIP, pp. 722–729, 2008.
      Google ScholarLocate open access versionFindings
    • Parkhi, O. M., Vedaldi, A., Zisserman, A., and Jawahar, C. Cats and dogs. CVPR, pp. 3498–3505, 2012.
      Google ScholarLocate open access versionFindings
    • Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., and SohlDickstein, J. On the expressive power of deep neural networks. ICML, 2017.
      Google ScholarLocate open access versionFindings
    • Ramachandran, P., Zoph, B., and Le, Q. V. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2018.
      Findings
    • Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. Regularized evolution for image classifier architecture search. AAAI, 2019.
      Google ScholarLocate open access versionFindings
    • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211–252, 2015.
      Google ScholarLocate open access versionFindings
    • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • Sharir, O. and Shashua, A. On the expressive power of overlapping architectures of deep learning. ICLR, 2018.
      Google ScholarLocate open access versionFindings
    • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
      Google ScholarLocate open access versionFindings
    • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. CVPR, pp. 1–9, 2015.
      Google ScholarLocate open access versionFindings
    • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. CVPR, pp. 2818–2826, 2016.
      Google ScholarLocate open access versionFindings
    • Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, 4:12, 2017.
      Google ScholarLocate open access versionFindings
    • Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. V. MnasNet: Platform-aware neural architecture search for mobile. CVPR, 2019.
      Google ScholarLocate open access versionFindings
    • Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K. Aggregated residual transformations for deep neural networks. CVPR, pp. 5987–5995, 2017.
      Google ScholarLocate open access versionFindings
    • Yang, T.-J., Howard, A., Chen, B., Zhang, X., Go, A., Sze, V., and Adam, H. Netadapt: Platform-aware neural network adaptation for mobile applications. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • Zagoruyko, S. and Komodakis, N. Wide residual networks. BMVC, 2016.
      Google ScholarLocate open access versionFindings
    • Zhang, X., Li, Z., Loy, C. C., and Lin, D. Polynet: A pursuit of structural diversity in very deep networks. CVPR, pp. 3900–3908, 2017.
      Google ScholarLocate open access versionFindings
    • Zhang, X., Zhou, X., Lin, M., and Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. Learning deep features for discriminative localization. CVPR, pp. 2921–2929, 2016.
      Google ScholarLocate open access versionFindings
    • Zoph, B. and Le, Q. V. Neural architecture search with reinforcement learning. ICLR, 2017.
      Google ScholarLocate open access versionFindings
    • Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. Learning transferable architectures for scalable image recognition. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments