Intriguing Properties of Adversarial Examples

    international conference on learning representations, 2018.

    Cited by: 10|Bibtex|Views72|Links
    EI
    Keywords:
    neural networkintriguing propertydeep learning modeladversarial robustnessprojected gradient descentMore(13+)
    Wei bo:
    We showed that architecture plays an important role in adversarial robustness, which correlates strongly with clean accuracy

    Abstract:

    It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we show that dis...More

    Code:

    Data:

    0
    Introduction
    • An intriguing aspect of deep learning models in computer vision is that while they can classify images with high accuracy, they fail catastrophically when those same images are perturbed slightly in an adversarial fashion (Szegedy et al, 2013; Goodfellow et al, 2014).
    • It has been shown that using stronger adversarial attacks in adversarial training can increase the robustness to stronger attacks, but at the cost of a decrease in clean accuracy (Madry et al, 2017).
    • Defensive distillation (Papernot et al, 2016b), feature squeezing (Xu et al, 2017), and Parseval training (Cisse et al, 2017) have been shown to make models more robust against adversarial attacks
    Highlights
    • An intriguing aspect of deep learning models in computer vision is that while they can classify images with high accuracy, they fail catastrophically when those same images are perturbed slightly in an adversarial fashion (Szegedy et al, 2013; Goodfellow et al, 2014)
    • We study the functional form of adversarial error and logit differences across several models and datasets, which turn out to be universal
    • In light of the surprising commonality in adversarial error at small, we investigate whether there is any way to get a different form for the adversarial error
    • We investigate whether the increased adversarial robustness is due to increased logit differences
    • In this paper we studied common properties of adversarial examples across different models and datasets
    • We showed that architecture plays an important role in adversarial robustness, which correlates strongly with clean accuracy
    Results
    • The authors observe that the vast majority of the time, it is ∆12 that goes to zero before any of the other ∆1j.
    • Despite the increase in adversarial accuracy, the permutation invariant MNIST model has 0.8% lower clean accuracy when trained with the entropy penalty.
    • Reaches a 17% higher adversarial accuracy on PGD examples.
    • Against other white- and black-box attacks the model is more robust, and the clean accuracy is 5.9% higher
    Conclusion
    • In this paper the authors studied common properties of adversarial examples across different models and datasets.
    • The authors theoretically derived a universality in logit differences and adversarial error of machine learning models.
    • The authors showed that architecture plays an important role in adversarial robustness, which correlates strongly with clean accuracy
    Summary
    • Introduction:

      An intriguing aspect of deep learning models in computer vision is that while they can classify images with high accuracy, they fail catastrophically when those same images are perturbed slightly in an adversarial fashion (Szegedy et al, 2013; Goodfellow et al, 2014).
    • It has been shown that using stronger adversarial attacks in adversarial training can increase the robustness to stronger attacks, but at the cost of a decrease in clean accuracy (Madry et al, 2017).
    • Defensive distillation (Papernot et al, 2016b), feature squeezing (Xu et al, 2017), and Parseval training (Cisse et al, 2017) have been shown to make models more robust against adversarial attacks
    • Objectives:

      The goal of this work is to study the common properties of adversarial examples.
    • Results:

      The authors observe that the vast majority of the time, it is ∆12 that goes to zero before any of the other ∆1j.
    • Despite the increase in adversarial accuracy, the permutation invariant MNIST model has 0.8% lower clean accuracy when trained with the entropy penalty.
    • Reaches a 17% higher adversarial accuracy on PGD examples.
    • Against other white- and black-box attacks the model is more robust, and the clean accuracy is 5.9% higher
    • Conclusion:

      In this paper the authors studied common properties of adversarial examples across different models and datasets.
    • The authors theoretically derived a universality in logit differences and adversarial error of machine learning models.
    • The authors showed that architecture plays an important role in adversarial robustness, which correlates strongly with clean accuracy
    Tables
    • Table1: Performance of our best architecture from Experiment 2 at = 8. Black-box attacks are sourced from a copy of the network independently initialized and trained
    Download tables as Excel
    Funding
    • Shows that this universality holds for a broad range of datasets , models, and attacks
    • Studies the effects of reducing prediction entropy on adversarial robustness
    • Finds that adversarial error caused by one-step least likely class method scales as a power-law where B is between 1.8 and 2.5 for small
    • Shows how, at small, the success of an adversarial attack depends on the input-logit Jacobian of the model and on the logits of the network
    • Demonstrates that the susceptibility of a model to FGSM and PGD attacks is in large part dictated by the cumulative distribution of the difference between the most likely logit and the second most likely logit
    Reference
    • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pp. 854–863, 2017.
      Google ScholarLocate open access versionFindings
    • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
      Findings
    • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
      Google ScholarLocate open access versionFindings
    • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
      Findings
    • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
      Google ScholarFindings
    • Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016a.
      Findings
    • Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016b.
      Findings
    • Yann LeCun and Corinna Cortes. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
      Findings
    • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
      Findings
    • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017.
      Findings
    • Aran Nayebi and Surya Ganguli. Biologically inspired protection of deep networks from adversarial attacks. arXiv preprint arXiv:1703.09202, 2017.
      Findings
    • Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697, 2016a.
      Findings
    • Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pp. 582–597. IEEE, 2016b.
      Google ScholarLocate open access versionFindings
    • Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017.
      Findings
    • Ben Poole, Subhaneil Lahiri, Maithreyi Raghu, Jascha Sohl-Dickstein, and Surya Ganguli. Exponential expressivity in deep neural networks through transient chaos. In Advances In Neural Information Processing Systems, pp. 3360–3368, 2016.
      Google ScholarLocate open access versionFindings
    • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
      Google ScholarLocate open access versionFindings
    • Samuel S Schoenholz, Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein. Deep information propagation. arXiv preprint arXiv:1611.01232, 2016.
      Findings
    • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
      Findings
    • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
      Google ScholarLocate open access versionFindings
    • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016.
      Google ScholarLocate open access versionFindings
    • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, pp. 4278–4284, 2017.
      Google ScholarLocate open access versionFindings
    • Florian Tramer, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
      Findings
    • Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
      Findings
    • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
      Findings
    • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
      Findings
    • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012, 2017.
      Findings
    Your rating :
    0

     

    Tags
    Comments