A Self-supervised Approach for Adversarial Robustness

CVPR, pp. 259-268, 2020.

Cited by: 1|Bibtex|Views34|Links
EI
Keywords:
High-level representation Guided DenoiserFast Gradient Sign Methodvision systemmean Average Precisiondeep convolutionalMore(23+)
Weibo:
It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection

Abstract:

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable ...More

Code:

Data:

0
Introduction
  • Adversarial training (AT) has shown great potential to safeguard neural networks from adversarial attacks [29, 35].
  • AT is performed in the model space i.e., a model’s parameters are modified by minimizing empirical risk for a given data distribution as well as the perturbed images.
  • Such AT strategy results in the following challenges.
  • Input transformations (e.g., Gaussian smoothing and JPEG compression) can maximize the attack strength instead of minimizing it [32, 10]
Highlights
  • Adversarial training (AT) has shown great potential to safeguard neural networks from adversarial attacks [29, 35]
  • Such Adversarial training strategy results in the following challenges. (a) Task dependency: Adversarial training is task-dependent e.g. robust classification models cannot directly be incorporated into an object detection or a segmentation pipeline, since the overall system would still require further training with modified task-dependant loss functions. (b) Computational cost: Adversarial training is computationally expensive [29] which restricts its applicability to high-dimensional and large-scale datasets such as ImageNet [34]. (c) Accuracy drop: models trained with Adversarial training lose significant accuracy on the original distribution e.g. ResNet50 [17] accuracy on ImageNet validation set drops from 76% to 64% when robustified against PGD attack [29] at a perturbation budget of only ǫ ≤ 2. (d) Label leakage: supervised Adversarial training suffers from label leakage [23] which allows the model to overfit on perturbations affecting model generalization to unseen adversaries [50]
  • Quantitative analysis in Table 1 shows that compared to previously broken defenses [10], Neural Representation Purifier achieves strong robustness against stateof-the-art attacks [47, 10], bringing down the effectiveness of the ensemble translation-invariant attack with input diversity (DIMT I ) [10] from 79.8% to 31.9%. (b) Neural Representation Purifier as Cross-task Defense: In order to measure the cross-task defense capabilities, we deploy Neural Representation Purifier against cross-domain attack (CDA) [32], a state-of-the-art attack that generates diverse cross-domain adversarial perturbations
  • Results in Table 2 demonstrate that Neural Representation Purifier successfully removes all unseen perturbations and proves a generic cross-task defense for classification, object detection and in
  • Sian Noise Purifier (GNP) does not prove effective against translation-invariant attacks [10], and (v) Training Neural Representation Purifier to stabilize Fast Gradient Sign Method adversaries performs relatively better than GNP. (d) What if Attacker knows about the Defense: We study this difficult scenario with the following criteria: (i) attacker knows that the defense is deployed and has access to its training data and training mechanism, and attacker trains a local defense similar to Neural Representation Purifier, and uses BPDA [6] to bypass the defense
  • Our defense is able to remove structured noise patterns where an adversarial image is maliciously embedded into the original image
Methods
  • Attack l∞≤8 l∞≤16 l∞≤8 l∞≤16 l∞≤8 l∞≤16.
  • Detection: Defending Mask-RCNN [16] against CDA [32].
  • Segmentation: Mask-RCNN [16] defense against CDA [32] NRP (Proposed).
  • NRP without Pixel Loss NRP with GAN Loss FGSP GNP.
  • NRP without Feature Loss Clean
Results
  • Defense Results and

    Insights (a) Generalizability Across Attacks: Figs. 6, 7 & 8 demonstrate generalization ability of NRP to recover images from strong adversarial noise.
  • Quantitative analysis in Table 1 shows that compared to previously broken defenses [10], NRP achieves strong robustness against stateof-the-art attacks [47, 10], bringing down the effectiveness of the ensemble translation-invariant attack with input diversity (DIMT I ) [10] from 79.8% to 31.9%.
  • Results in Table 2 demonstrate that NRP successfully removes all unseen perturbations and proves a generic cross-task defense for classification, object detection and in- Defenses Attacks.
Conclusion
  • The authors propose a novel defense approach that removes harmful perturbations using an adversarially trained purifier.
  • The authors' defense does not require large training data and is independent of the label-space.
  • It exhibits a high generalizability to the unseen state-of-the-art attacks and successfully defends a variety of tasks including classification, segmentation and object detection.
  • The authors' defense is able to remove structured noise patterns where an adversarial image is maliciously embedded into the original image
Summary
  • Introduction:

    Adversarial training (AT) has shown great potential to safeguard neural networks from adversarial attacks [29, 35].
  • AT is performed in the model space i.e., a model’s parameters are modified by minimizing empirical risk for a given data distribution as well as the perturbed images.
  • Such AT strategy results in the following challenges.
  • Input transformations (e.g., Gaussian smoothing and JPEG compression) can maximize the attack strength instead of minimizing it [32, 10]
  • Methods:

    Attack l∞≤8 l∞≤16 l∞≤8 l∞≤16 l∞≤8 l∞≤16.
  • Detection: Defending Mask-RCNN [16] against CDA [32].
  • Segmentation: Mask-RCNN [16] defense against CDA [32] NRP (Proposed).
  • NRP without Pixel Loss NRP with GAN Loss FGSP GNP.
  • NRP without Feature Loss Clean
  • Results:

    Defense Results and

    Insights (a) Generalizability Across Attacks: Figs. 6, 7 & 8 demonstrate generalization ability of NRP to recover images from strong adversarial noise.
  • Quantitative analysis in Table 1 shows that compared to previously broken defenses [10], NRP achieves strong robustness against stateof-the-art attacks [47, 10], bringing down the effectiveness of the ensemble translation-invariant attack with input diversity (DIMT I ) [10] from 79.8% to 31.9%.
  • Results in Table 2 demonstrate that NRP successfully removes all unseen perturbations and proves a generic cross-task defense for classification, object detection and in- Defenses Attacks.
  • Conclusion:

    The authors propose a novel defense approach that removes harmful perturbations using an adversarially trained purifier.
  • The authors' defense does not require large training data and is independent of the label-space.
  • It exhibits a high generalizability to the unseen state-of-the-art attacks and successfully defends a variety of tasks including classification, segmentation and object detection.
  • The authors' defense is able to remove structured noise patterns where an adversarial image is maliciously embedded into the original image
Tables
  • Table1: Robustness of different defense methods against stateof-the-art black-box attacks (lower is better). IncRes-v2ens is used as backbone model following [<a class="ref-link" id="c10" href="#r10">10</a>]. NRP significantly reduces the attack success rate. Adversaries (ǫ ≤ 16) are created against Incv3, Inc-v4, IncRes-v2, Res-v2-152 and Ensemble
  • Table2: NRP generalizability across different adversarial attacks. Classification model is defended against CDA trained against Incv3 while detection and segmentation models are defended against CDA trained against Res-v2-152 (higher is better). (q=quantity, w=weights, win=window size)
  • Table3: Success rate (lower is better) of BPDA [<a class="ref-link" id="c6" href="#r6">6</a>] and DIMT I [<a class="ref-link" id="c10" href="#r10">10</a>] attacks against NRP. Res-v2-152 [<a class="ref-link" id="c18" href="#r18">18</a>] is combined with other purifier networks (ResG [<a class="ref-link" id="c24" href="#r24">24</a>], UNet [<a class="ref-link" id="c33" href="#r33">33</a>]). Adversaries are then transferred to the naturally and adversarially trained models. NRP protects the backbone network even when the attacker tries to bypass using BPDA technique. (attack iterations: 10, ǫ ≤ 16)
  • Table4: Cross-task SSP Attack: Pixel-level accuracy is shown for SegNet-Basic [<a class="ref-link" id="c4" href="#r4">4</a>] on Camvid testset [<a class="ref-link" id="c5" href="#r5">5</a>], while mAP (with IoU = 0.5) is reported for Mask-RCNN
  • Table5: SSP as an attack for Classification. Top-1 (T-1) and Top-5 (T-5) accuracies are reported under untargeted l∞ adversarial attacks on ImageNet-NIPS with perturbation budget l∞ ≤ 16. ‘∗’ indicates white-box attacks
Download tables as Excel
Related work
  • Defenses: A major class of adversarial defenses processes the input images to achieve robustness against adversarial patterns. For example, [14] used JPEG compression to remove high-frequency components that are less important to human vision using discrete cosine transform. A compressed sensing approach called Total Variation Minimization (TVM) was proposed in [14] to remove the small localized changes caused by adversarial perturbations. Xie et al [46] introduced the process of Random Resizing and Padding (R&P) as a pre-processing step to mitigate the adversarial effect. A High-level representation Guided Denoiser (HGD) [26] framework was used as a pre-processing step to remove perturbations. NeurIPS 2017 Defense Competition Rank-3 (NeurIPS-r3) approach [42] introduced a two step prep-processing pipeline where the images first undergo a series of transformations (JPEG, rotation, zoom, shift and sheer) and then passed through an ensemble of adversarially trained models to obtain the weighted output response as a prediction. [36] proposed to recover adversaries using GAN and [31] super-resolve images to minimize adversarial effect. As compared to the above defenses, we design an input processing model that derives a selfsupervised signal from the deep feature space to adversarially train the defense model. Our results show significantly superior performance to all so-far developed input processing based defenses.
Reference
  • [16] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross B. Girshick. Mask r-cnn. 2017 IEEE International Conference
    Google ScholarLocate open access versionFindings
  • [1] Martin Arjovsky, Soumith Chintala, and Leon Bottou. on Computer Vision (ICCV), pages 2980–2988, 2017. 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017. 3
    Findings
  • [2] Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning (ICML), 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • [3] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. In International Conference on Machine Learning (ICML), 2017. 2
    Google ScholarLocate open access versionFindings
  • [4] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:2481–2495, 2017. 7
    Google ScholarLocate open access versionFindings
  • [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1
    Google ScholarLocate open access versionFindings
  • [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016. 5, 7
    Google ScholarLocate open access versionFindings
  • [19] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡1mb model size. ArXiv, abs/1602.07360, 2017. 5
    Findings
  • [5] Gabriel J Brostow, Julien Fauqueur, and Roberto Cipolla. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2):88–97,
    Google ScholarLocate open access versionFindings
  • [20] Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018. 4
    Findings
  • [21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
    Google ScholarFindings
  • [6] Nicholas Carlini and David Wagner. Towards evaluating the Imagenet classification with deep convolutional neural netrobustness of neural networks. In 2017 IEEE Symposium on works. Commun. ACM, 60:84–90, 2012. 5
    Google ScholarLocate open access versionFindings
  • Security and Privacy (SP), pages 39–57. IEEE, 2017. 3, 7
    Google ScholarFindings
  • [22] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych,
    Google ScholarFindings
  • [7] NeurIPS Challenge. https://www.kaggle.com/c/ Dmytro Mishkin, and Ji Matas. Deblurgan: Blind motion nips-2017-defense-against-adversarial-attack/ deblurring using conditional adversarial networks. In The data. Kaggle, 2017. 4
    Locate open access versionFindings
  • [8] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certition (CVPR), June 2018. 5 fied adversarial robustness via randomized smoothing. arXiv
    Google ScholarLocate open access versionFindings
  • [23] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adpreprint arXiv:1902.02918, 2019. 6 versarial machine learning at scale. arXiv preprint
    Findings
  • [9] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun arXiv:1611.01236, 2016. 1, 5
    Findings
  • [24] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, tacks with momentum. In The IEEE Conference on Com-Andrew Cunningham, Alejandro Acosta, Andrew Aitken, puter Vision and Pattern Recognition (CVPR), June 2018. 2, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-
    Google ScholarLocate open access versionFindings
  • [10] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. versarial network. In Proceedings of the IEEE conference on
    Google ScholarLocate open access versionFindings
  • Evading defenses to transferable adversarial examples by computer vision and pattern recognition, pages 4681–4690, translation-invariant attacks. In Proceedings of the IEEE
    Google ScholarLocate open access versionFindings
  • Computer Society Conference on Computer Vision and Pat- [25] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, tern Recognition, 2019. 2, 5, 6, 7, 8
    Google ScholarFindings
  • [11] Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, and Aleksander Madry. Adversarial robustness as a prior for learned representations, 2019. 7 tacks using high-level representation guided denoiser. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 6
    Google ScholarLocate open access versionFindings
  • [12] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy.
    Google ScholarFindings
  • Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Explaining and harnessing adversarial examples. In Inter-Jun Zhu, and Xiaolin Hu. Defense against adversarial atnational Conference on Learning Representations (ICRL), tacks using high-level representation guided denoiser. 2018
    Google ScholarFindings
  • 2015. 2, 3, 7, 8
    Google ScholarLocate open access versionFindings
  • [13] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Recognition, pages 1778–1787, 2017. 2
    Google ScholarLocate open access versionFindings
  • [27] Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Conference on Learning Representations (ICRL), 2017. 2, 3, Piotr Dollar. Focal loss for dense object detection. IEEE
    Google ScholarFindings
  • [14] Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens
    Google ScholarFindings
  • 2018. 7, 8 van der Maaten. Countering adversarial images using input
    Google ScholarLocate open access versionFindings
  • [28] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, transformations. In International Conference on Learning Representations (ICRL), 2017. 1, 2
    Google ScholarLocate open access versionFindings
  • [15] Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Springer, 2014. 8 transformations. In International Conference on Learning
    Google ScholarLocate open access versionFindings
  • [29] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Representations, 2018. 6 Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. 1
    Google ScholarLocate open access versionFindings
  • [30] Konda Reddy Mopuri, Utsav Garg, and R Venkatesh Babu. Fast feature fool: A data independent approach to universal adversarial perturbations. In Proceedings of the British Machine Vision Conference (BMVC), 2017. 8
    Google ScholarFindings
  • [31] Aamir Mustafa, Salman H Khan, Munawar Hayat, Jianbing Shen, and Ling Shao. Image super-resolution as a defense against adversarial attacks. arXiv preprint arXiv:1901.01677, 2019. 2, 6
    Findings
  • [32] Muzammal Naseer, Salman H Khan, Harris Khan, Fahad Shahbaz Khan, and Fatih Porikli. Cross-domain transferability of adversarial perturbations. Advances in Neural Information Processing Systems, 2019. 2, 4, 5, 6, 8
    Google ScholarLocate open access versionFindings
  • [33] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015. 7
    Google ScholarLocate open access versionFindings
  • [34] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 1, 2, 5
    Google ScholarFindings
  • [35] Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! arXiv preprint arXiv:1904.12843, 2019. 1
    Findings
  • [36] Shiwei Shen, Guoqing Jin, Ke Gao, and Yongdong Zhang. Ape-gan: Adversarial perturbation elimination with gan. arXiv preprint arXiv:1707.05474, 2017. 2, 6
    Findings
  • [37] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition, 2014. 5
    Google ScholarLocate open access versionFindings
  • [38] Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy? – a comprehensive study on the robustness of 18 deep image classification models. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 644–661, Cham, 2018. Springer International Publishing. 3
    Google ScholarLocate open access versionFindings
  • [39] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, volume 4, page 12, 2017. 5, 7
    Google ScholarLocate open access versionFindings
  • [40] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016. 5
    Google ScholarLocate open access versionFindings
  • [41] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICRL), 2014. 2
    Google ScholarFindings
  • [42] Anil Thomas and Oguz Elibol. Defense against adversarial attacks-3rd place. https://github.com/anlthms/ nips-2017/blob/master/poster/defense. pdf, 2017. 2, 6
    Locate open access versionFindings
  • [43] Florian Tramer, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICRL), 2018. 2, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • [44] Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. Edvr: Video restoration with enhanced deformable convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 5
    Google ScholarFindings
  • [45] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017. 1, 6
    Findings
  • [46] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018. 2
    Google ScholarFindings
  • [47] Cihang Xie, Zhishuai Zhang, Jianyu Wang, Yuyin Zhou, Zhou Ren, and Alan Yuille. Improving transferability of adversarial examples with input diversity. arXiv preprint arXiv:1803.06978, 2018. 2, 3, 5, 7, 8
    Findings
  • [48] Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015. 5
    Findings
  • [49] Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Efficient defenses against adversarial attacks. ArXiv, abs/1707.06728, 2017. 6
    Findings
  • [50] Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scattering-based adversarial training. arXiv preprint arXiv:1907.10764, 2019. 1
    Findings
  • [51] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 5
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments