Identity Mappings in Deep Residual Networks

ECCV, 2016.

被引用3656|引用|浏览363|来源
EI
关键词
ablation experimentpropagation formulationdeep residual networkdeep networkidentity mapping更多(7+)
微博一下
This paper investigates the propagation formulations behind the connection mechanisms of deep residual networks

摘要

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other b...更多

代码

数据

简介
  • Deep residual networks (ResNets) [1] consist of many stacked “Residual Units”.
  • The central idea of ResNets is to learn the additive residual function F with respect to h, with a key choice of using an identity mapping h = xl.
  • This is realized by attaching an identity skip connection (“shortcut”)
重点内容
  • Deep residual networks (ResNets) [1] consist of many stacked “Residual Units”
  • We have done preliminary experiments using the skip connections studied in Figs. 2 and 3 on ImageNet with residual networks-101 [1], and observed similar optimization difficulties
  • This paper investigates the propagation formulations behind the connection mechanisms of deep residual networks
  • Our derivations imply that identity shortcut connections and identity after-addition activation are essential for making information propagation smooth
  • Ablation experiments demonstrate phenomena that are consistent with our derivations
  • We present 1000-layer deep networks that can be trained and achieve improved accuracy
方法
  • Augmentation Train crop Test crop Top-1 top-5

    ResNet-152, original Residual Unit [1] scale

    ResNet-152, pre-act Residual Unit scale

    ResNet-200, original Residual Unit [1] scale

    ResNet-200, pre-act Residual Unit ResNet-200, pre-act Residual Unit scale

    224×224 scale+asp ratio 224×224

    320×320 20.7 5.3 320×320 20.1† 4.8†

    Inception v3 [19].
  • Augmentation Train crop Test crop Top-1 top-5.
  • ResNet-152, pre-act Residual Unit scale.
  • ResNet-200, original Residual Unit [1] scale.
  • ResNet-200, pre-act Residual Unit ResNet-200, pre-act Residual Unit scale.
  • Inception v3 [19]
结果
  • The authors note that the authors do not specially tailor the network width or filter sizes, nor use regularization techniques which are very effective for these small datasets
  • The authors obtain these results via a simple but essential concept—going deeper.
  • The authors did finish a “BN after addition” version (Fig. 4(b)) of ResNet-101 on ImageNet and observed higher training loss and validation error.
  • This is in line with the results on CIFAR in Fig. 6
结论
  • This paper investigates the propagation formulations behind the connection mechanisms of deep residual networks.
  • The authors' derivations imply that identity shortcut connections and identity after-addition activation are essential for making information propagation smooth.
  • Ablation experiments demonstrate phenomena that are consistent with the derivations.
  • The authors present 1000-layer deep networks that can be trained and achieve improved accuracy
总结
  • Introduction:

    Deep residual networks (ResNets) [1] consist of many stacked “Residual Units”.
  • The central idea of ResNets is to learn the additive residual function F with respect to h, with a key choice of using an identity mapping h = xl.
  • This is realized by attaching an identity skip connection (“shortcut”)
  • Methods:

    Augmentation Train crop Test crop Top-1 top-5

    ResNet-152, original Residual Unit [1] scale

    ResNet-152, pre-act Residual Unit scale

    ResNet-200, original Residual Unit [1] scale

    ResNet-200, pre-act Residual Unit ResNet-200, pre-act Residual Unit scale

    224×224 scale+asp ratio 224×224

    320×320 20.7 5.3 320×320 20.1† 4.8†

    Inception v3 [19].
  • Augmentation Train crop Test crop Top-1 top-5.
  • ResNet-152, pre-act Residual Unit scale.
  • ResNet-200, original Residual Unit [1] scale.
  • ResNet-200, pre-act Residual Unit ResNet-200, pre-act Residual Unit scale.
  • Inception v3 [19]
  • Results:

    The authors note that the authors do not specially tailor the network width or filter sizes, nor use regularization techniques which are very effective for these small datasets
  • The authors obtain these results via a simple but essential concept—going deeper.
  • The authors did finish a “BN after addition” version (Fig. 4(b)) of ResNet-101 on ImageNet and observed higher training loss and validation error.
  • This is in line with the results on CIFAR in Fig. 6
  • Conclusion:

    This paper investigates the propagation formulations behind the connection mechanisms of deep residual networks.
  • The authors' derivations imply that identity shortcut connections and identity after-addition activation are essential for making information propagation smooth.
  • Ablation experiments demonstrate phenomena that are consistent with the derivations.
  • The authors present 1000-layer deep networks that can be trained and achieve improved accuracy
表格
  • Table1: Classification error on the CIFAR-10 test set using ResNet-110 [<a class="ref-link" id="c1" href="#r1">1</a>], with different types of shortcut connections applied to all Residual Units. We report “fail” when the test error is higher than 20 %
  • Table2: Classification error (%) on the CIFAR-10 test set using different activation functions
  • Table3: Classification error (%) on the CIFAR-10/100 test set using the original Residual Units and our pre-activation Residual Units
  • Table4: Comparisons with state-of-the-art methods on CIFAR-10 and CIFAR-100 using “moderate data augmentation” (flip/translation), except for ELU [<a class="ref-link" id="c12" href="#r12">12</a>] with no augmentation. Better results of [<a class="ref-link" id="c13" href="#r13">13</a>, <a class="ref-link" id="c14" href="#r14">14</a>] have been reported using stronger data augmentation and ensembling. For the ResNets we also report the number of parameters. Our results are the median of 5 runs with mean±std in the brackets. All ResNets results are obtained with a mini-batch size of 128 except † with a mini-batch size of 64 (code available at https://github.com/KaimingHe/resnet-1k-layers)
  • Table5: Comparisons of single-crop error on the ILSVRC 2012 validation set. All ResNets are trained using the same hyper-parameters and implementations as [<a class="ref-link" id="c1" href="#r1">1</a>]). Our Residual Units are the full pre-activation version (Fig. 4(e)). †: code/model available at https://github.com/facebook/fb.resnet.torch/tree/master/pretrained, using scale and aspect ratio augmentation in [<a class="ref-link" id="c20" href="#r20">20</a>]
Download tables as Excel
基金
  • Reports improved results using a 1001-layer ResNet on CIFAR10 and CIFAR-100, and a 200-layer ResNet on ImageNet
  • Presents competitive results on CIFAR-10/100 with a 1001-layer ResNet, which is much easier to train and generalizes better than the original ResNet in
  • Reports improved results on ImageNet using a 200-layer ResNet, for which the counterpart of starts to overfit
  • Expects that they do not have the exponential impact as presents in Sect
引用论文
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
    Google ScholarFindings
  • Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
    Google ScholarFindings
  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
    Google ScholarLocate open access versionFindings
  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014)
    Google ScholarLocate open access versionFindings
  • Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
    Google ScholarLocate open access versionFindings
  • Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. In: ICML Workshop (2015)
    Google ScholarFindings
  • Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: NIPS (2015)
    Google ScholarFindings
  • Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
    Google ScholarFindings
  • LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
    Google ScholarFindings
  • Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
    Findings
  • Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)
    Google ScholarFindings
  • Graham, B.: Fractional max-pooling (2014). arXiv:1412.6071 14. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014). arXiv:1412.6806 15.
    Findings
  • Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)
    Google ScholarFindings
  • 16. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS (2015)
    Google ScholarFindings
  • 17. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: ICLR (2015)
    Google ScholarFindings
  • 18. Mishkin, D., Matas, J.: All you need is a good init. In: ICLR (2016)
    Google ScholarFindings
  • 19. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
    Google ScholarFindings
  • 20. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
    Google ScholarFindings
  • 21. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016). arXiv:1602.07261 22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
    Findings
  • 23. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: ICCV (2015)
    Google ScholarFindings
您的评分 :
0

 

标签
评论