## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# MagNet: a Two-Pronged Defense against Adversarial Examples.

CCS, (2017): 135-147

EI

Full Text

Weibo

Abstract

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequen...More

Code:

Data:

Introduction

- Deep learning demonstrated impressive performance on many tasks, such as image classification [9] and natural

Hao Chen language processing [16]. - Researchers showed that it was possible to generate adversarial examples to fool classifiers [34, 5, 24, 19]
- Their algorithms perturbed normal examples by a small volume that did not affect human recognition but that caused mis-classification by the learning system.

Highlights

- In recent years, deep learning demonstrated impressive performance on many tasks, such as image classification [9] and natural

Hao Chen language processing [16] - Current defenses against adversarial examples follow three approaches: (1) Training the target classifier with adversarial examples, called adversarial training [34, 5]; (2) Training a classifier to distinguish between normal and adversarial examples [20]; and (3) Making target classifiers hard to attack by blocking gradient pathway, e.g., defensive distillation [25]
- In Section 1.2 we provided two reasons why a classifier mis-classifies an adversarial example: (1) The example is far from the boundary of the manifold of normal examples, but the classifier has no option to reject it; (2) The example is close to the boundary of the manifold, but the classifier generalizes poorly off the manifold in the vicinity of the example
- We proposed MagNet, a framework for defending against adversarial perturbation of examples for neural networks
- By using autoencoder as detector networks, MagNet learns to detect adversarial examples without requiring either adversarial examples or the knowledge of the process for generating them, which leads to better generalization
- Experiments show that MagNet defended against the state-of-art attacks effectively

Methods

**Optimization Method SGD SGD**

Learning Rate

Batch Size Epochs

The authors may divide attacks using adversarial examples into two types.- The attacker does not care which class the victim classifier outputs as long as it is different from the between reconstruction error and autoencoder diversity.
- It encourages autoencoder diversity and increases reconstruction error.
- The authors will evaluate this approach in Section 5.4.
- The authors found that the authors needed only the reconstruction error-based detector and reformer to become highly accurate against adversarial examples generated from MNIST.
- The authors selected the threshold of reconstruction error such that the false positive rate of the detector on the validation set is at most 0.001, i.e., each detector mistakenly rejects no more than 0.1% examples in the validation set

Results

- Net achieved more than 99% classification accuracy on adversarial normal examples for training.
- The authors selected the threshold of reconstruction error such that the false positive rate of the detector on the validation set is at most 0.001, i.e., each detector mistakenly rejects no more than 0.1% examples in the validation set.
- The authors trained a classifier to achieve an accuracy of 90.6%, which is close to the state of the art

Conclusion

- The effectiveness of MagNet against adversarial examples depends on the following assumptions:

There exist detector functions that measure the distance between its input and the manifold of normal examples.

There exist reformer functions that output an example x ′ that is perceptibly close to the input x, and x ′ is closer to the manifold than x.

The authors chose autoencoder for both the reformer and the two types of detectors in MagNet. - MagNet handles 11 untrusted input using two methods
- It detects adversarial examples with large perturbation using detector networks, and pushes examples with small perturbation towards the manifold of normal examples.
- These two methods work jointly to enhance the classification accuracy.
- In case that the attacker knows the training examples of MagNet, the authors described a new graybox threat model and used diversity to defend against this attack effectively

- Table1: Architecture of the classifiers to be protected
- Table2: Training parameters of classifiers to be protected
- Table3: Defensive devices architectures used for MNIST, including both encoders and decoders
- Table4: Defensive devices architecture used for CIFAR-10, including both encoders and decoders
- Table5: Training parameters for defensive devices
- Table6: Classification accuracy of MagNet on adversarial examples generated by different attack methods. Some of these attacks have different parameters on MNIST and CIFAR-10 because they need to adjust their parameters according to datasets
- Table7: Classification accuracy in percentage on adversarial examples generated by graybox attack on CIFAR-10. We name each autoencoder A through H. Each column corresponds to an autoencoder that the attack is trained on, and each row corresponds to an autoencoder that is used during testing. The last row, random, means that MagNet picks a random one from its eight autoencoders
- Table8: Classification accuracy in percentage on the test set for CIFAR-10. Each column corresponds to a different autoencoder chosen during testing. “Rand” means that MagNet randomly chooses an autoencoder during testing

Reference

- Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
- Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
- Shreyansh Daftry, J Andrew Bagnell, and Martial Hebert. Learning transferable policies for monocular reactive mav control. arXiv preprint arXiv:1608.00627, 2016.
- Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
- Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
- Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
- Wookhyun Jung, Sangwon Kim, and Sangyong Choi. Poster: deep learning for zero-day flash malware detection. In 36th IEEE Symposium on Security and Privacy, 2015.
- Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182, 2017.
- Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832, 2017.
- Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.
- Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378–1387, 2016.
- Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
- Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.
- Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017.
- Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR), April 24–26, 2017.
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.
- H. Narayanan and S. Mitter. Sample complexity of testing the manifold hypothesis. In NIPS, 2010.
- N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroSP), 2016.
- N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2016.
- Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. Cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2016.
- Nicolas Papernot, Patrick D. McDaniel, Ananthram Swami, and Richard E. Harang. Crafting adversarial input sequences for recurrent neural networks. CoRR, abs/1604.08275, 2016.
- Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017.
- Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 1916– 1920. IEEE, 2015.
- Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
- Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, (0), 2017.
- Justin Sirignano, Apaar Sadhwani, and Kay Giesecke. Deep learning for mortgage risk, 2016.
- Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096– 1103, 2008.
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
- Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. arXiv preprint arXiv:1703.08603, 2017.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn