# How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework

CVPR, pp. 279-287, 2020.

EI

Keywords:

neural stochastic differential equationtest accuracydifferential equationgood generalizationadversarial exampleMore(10+)

Weibo:

Abstract:

Neural Ordinary Differential Equation (Neural ODE) has been proposed as a continuous approximation to the ResNet architecture. Some commonly used regularization mechanisms in discrete neural networks (e.g., dropout, Gaussian noise) are missing in current Neural ODE networks. In this paper, we propose a new continuous neural network framew...More

Code:

Data:

Introduction

- Despite the superhuman performance in many computer vision tasks, recent findings [2, 8, 28] demonstrate that deep neural networks remain to be more fragile than human or even shallow models
- Existing work support such phenomenon from different perspectives; for instance, on CIFAR-10 and ImageNet, [25] shows that the test accuracy drops by 5% − 15% if the authors replace the original test set by a new one.
- It is very interesting to see whether there is a unified way to mitigate all the problems, and whether the authors can find a theoretical explanation for it

Highlights

- Despite the superhuman performance in many computer vision tasks, recent findings [2, 8, 28] demonstrate that deep neural networks remain to be more fragile than human or even shallow models. Existing work support such phenomenon from different perspectives; for instance, on CIFAR-10 and ImageNet, [25] shows that the test accuracy drops by 5% − 15% if we replace the original test set by a new one
- Even on the same test set, unnoticeable adversarial perturbations crafted by specific algorithms [19] can make the test accuracy close to zero
- To study and understand how randomness stabilizes neural networks, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural Stochastic differential equation), which models the continuous limits of ResNet based on the recent proposed Neural ODE model [3] and adds stochastic diffusion and jump terms to cover various commonly used regularization mechanisms based on random noise, including Dropout, stochastic depth and Gaussian smoothing
- While [30] deals with adversarial robustness, their method is still based on adversarial training
- We introduce the Neural Stochastic differential equation model, which can stabilize the prediction of Neural ODE by injecting stochastic noise

Conclusion

- The authors introduce the Neural SDE model, which can stabilize the prediction of Neural ODE by injecting stochastic noise.
- The authors' model can achieve better generalization and improve the robustness under both adversarial and non-adversarial noises

Summary

## Introduction:

Despite the superhuman performance in many computer vision tasks, recent findings [2, 8, 28] demonstrate that deep neural networks remain to be more fragile than human or even shallow models- Existing work support such phenomenon from different perspectives; for instance, on CIFAR-10 and ImageNet, [25] shows that the test accuracy drops by 5% − 15% if the authors replace the original test set by a new one.
- It is very interesting to see whether there is a unified way to mitigate all the problems, and whether the authors can find a theoretical explanation for it
## Conclusion:

The authors introduce the Neural SDE model, which can stabilize the prediction of Neural ODE by injecting stochastic noise.- The authors' model can achieve better generalization and improve the robustness under both adversarial and non-adversarial noises

- Table1: Evaluating the model generalization under different choices of diffusion matrix G(ht, t; v) introduced above. For the three noise types, we search a suitable parameter σt for each of them so that the diffusion matrix G properly regularizes the model. TTN means testing time noise. We observe adding noises can improve the test accuracy over Neural ODE, and furthermore, noise at testing time is beneficial
- Table2: Testing accuracy results under different levels of non-adversarial perturbations

Related work

- Our work is inspired by the success of the Neural ODE network, and we seek to improve the generalization and robustness of Neural ODE by adding noise in the dynamic system. Regularization mechanisms such as dropout cannot be easily incorporated in the original Neural ODE due to its deterministic nature.

Neural ODE The idea of formulating ResNet as a dynamic system was discussed in [5]. A framework was proposed to link existing deep architectures with discretized numerical ODE solvers [18], and was shown to be parameter efficient. These networks adopt layer-wise architecture – each layer is parameterized by different independent weights. The Neural ODE model [3] computes hidden states in a different way: it directly models the dynamics of hidden states by an ODE solver, with the dynamics parameterized by a shared model. A memory efficient approach to compute gradient by adjoint methods was developed, making it possible to train large, multi-scale generative networks [1, 9]. Our work can be regarded as an extension of this framework, with the purpose of incorporating a variety of noise-injection based regularization mechanisms. Stochastic differential equation (SDE) in the context of neural network has been studied recently, focusing either on understanding how dropout shapes the loss landscape [27], or on using SDE as a universal function approximation tool to learn the solution of high dimensional PDEs [23]. Instead, we aim to explain why adding random noise boosts the stability of deep networks, and demonstrates the improved generalization and robustness.

Funding

- This work is partially supported by NSF under IIS1719097

Reference

- Lynton Ardizzone, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W Pellegrini, Ralf S Klessen, Lena MaierHein, Carsten Rother, and Ullrich Kothe. Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730, 2018. 2
- Anurag Arnab, Ondrej Miksik, and Philip H. S. Torr. On the robustness of semantic segmentation models to adversarial attacks. In CVPR, 2018. 1
- Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6572–6583, 2018. 1, 2, 4, 5
- Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019. 1, 4
- Weinan E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017. 2
- Xavier Gastaldi. Shake-shake regularization. arXiv preprint arXiv:1705.07485, 2017. 2
- Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems, pages 10727–10737, 2018. 1, 2
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. 1
- Will Grathwohl, Ricky TQ Chen, Jesse Betterncourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018. 2
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 2
- Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019. 1, 7
- Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In European conference on computer vision, pages 646–661. Springer, 2016. 1, 2
- Junteng Jia and Austin R. Benson. Neural jump stochastic differential equations, 2019. 2
- Ioannis Karatzas and Steven E Shreve. Brownian motion. In Brownian Motion and Stochastic Calculus, pages 47–127. Springer, 1998. 3
- Peter E Kloeden and Eckhard Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013. 4
- Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018. 2, 4, 5
- Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random selfensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018. 2, 4, 5
- Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In International Conference on Machine Learning, pages 3282–3291, 202
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. 1, 8
- Xuerong Mao. Stochastic differential equations and applications. Elsevier, 2007. 6
- GN Mil’shtein. Approximate integration of stochastic differential equations. Theory of Probability & Its Applications, 19(3):557–562, 1975. 4
- Bernt Øksendal. Stochastic differential equations. In Stochastic differential equations, pages 65–84. Springer, 2003. 3, 5
- Maziar Raissi. Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations. arXiv preprint arXiv:1804.07010, 2018. 2
- Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10 classifiers generalize to cifar10? 2018. https://arxiv.org/abs/1806.00451.2
- Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? arXiv preprint arXiv:1902.10811, 2019. 1
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014. 1, 2
- Qi Sun, Yunzhe Tao, and Qiang Du. Stochastic training of residual networks: a differential equation viewpoint. arXiv preprint arXiv:1812.00174, 2018. 2
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. 1
- Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, 2019. 2
- Bao Wang, Binjie Yuan, Zuoqiang Shi, and Stanley J Osher. Enresnet: Resnet ensemble via the feynman-kac formalism. In Neural Information Processing Systems, 2019. 2

Tags

Comments