Improved Adversarial Training via Learned Optimizer

Cited by: 0|Bibtex|Views25|Links
Keywords:
backtracking line searchphysical worldLearning to Learnminimax optimization problemrecurrent neural networksMore(15+)
Weibo:
We empirically demonstrate that the commonly used Projected Gradient Descent attack may not be optimal for inner maximization, and improved inner optimizer can lead to a more robust model

Abstract:

Adversarial attack has recently become a tremendous threat to deep learning models. To improve the robustness of machine learning models, adversarial training, formulated as a minimax optimization problem, has been recognized as one of the most effective defense mechanisms. However, the non-convex and non-concave property poses a great ...More

Code:

Data:

0
Introduction
  • It has been widely acknowledged that deep neural networks (DNN) have made tremendous breakthroughs benefiting both academia and industry.
  • Many DNN models trained with benign inputs are vulnerable to small and undetectable perturbation added to original data and tend to make wrong predictions under such threats
  • Those perturbed examples, known as adversarial examples, can be constructed by algorithms such as DeepFool [23], Fast Gradient Sign Method (FGSM) [11], and Carlini-Wagner (C&W) attack [4].
  • How to train a model resistant to adversarial inputs has become an important topic
Highlights
  • It has been widely acknowledged that deep neural networks (DNN) have made tremendous breakthroughs benefiting both academia and industry
  • Many DNN models trained with benign inputs are vulnerable to small and undetectable perturbation added to original data and tend to make wrong predictions under such threats
  • – Comprehensive experimental results show that the proposed method can noticeably improve the robust accuracy of both adversarial training [21] and TRADES [39]
  • We found that the performance of adversarial training crucially depends on the optimization algorithm used for inner maximization, and the current widely used Projected Gradient Descent (PGD) algorithm may not be the optimal choice
  • We show it is possible and practical to learn an optimizer for inner maximization in adversarial training
  • In practice we found that considering intermediate iterations can further improve the performance since it will make the maximizer converges faster even after conducting one or few iterations
  • For defense mechanisms that can be formulated as a minimax optimization problem, we propose to replace the inner PGD-based maximizer with a automatically learned recurrent neural networks (RNN) maximizer, and show that jointly training the RNN maximizer and classifier can significantly improve the defense performance
Methods
  • CNN generator [13,14] to produce perturbations in adversarial training.
  • CNN-based generator has a larger number of trainable parameters, which makes it hard to train.
  • In Table 2, the detailed properties including the number of parameter and training time per epoch are provided for different learning-to-learn based methods.
  • The authors can observe that the proposed RNN approach stands out with the smallest parameters as well as efficiency in training.
  • Comparison of the variants and original adversarial training methods can be found in Appendix B
Results
  • The authors present experimental results of the proposed RNN-based adversarial training.
  • Algorithm 1 RNN-based adversarial training.
  • 1: Input: clean data {(x, y)}, batch size B, step sizes α1 and α2, number of inner iterations T , classifier parameterized by θ, RNN optimizer parameterized by φ 2: Output.
  • Robust classifer fθ, learned optimizer mφ.
  • 3: Randomly initialize fθ and mφ, or initialize them with pre-trained configurations 4: repeat.
  • 5: Sample a mini-batch M from clean data.
Conclusion
  • For defense mechanisms that can be formulated as a minimax optimization problem, the authors propose to replace the inner PGD-based maximizer with a automatically learned RNN maximizer, and show that jointly training the RNN maximizer and classifier can significantly improve the defense performance.
  • It can be a worthwhile direction to address the inadequacy of L2L in dealing with a long-horizon problem.
  • The authors can substitute the learned optimizer for hand-designed algorithms in both inner and outer problems, which enables an entirely automatic process for adversarial training
Summary
  • Introduction:

    It has been widely acknowledged that deep neural networks (DNN) have made tremendous breakthroughs benefiting both academia and industry.
  • Many DNN models trained with benign inputs are vulnerable to small and undetectable perturbation added to original data and tend to make wrong predictions under such threats
  • Those perturbed examples, known as adversarial examples, can be constructed by algorithms such as DeepFool [23], Fast Gradient Sign Method (FGSM) [11], and Carlini-Wagner (C&W) attack [4].
  • How to train a model resistant to adversarial inputs has become an important topic
  • Methods:

    CNN generator [13,14] to produce perturbations in adversarial training.
  • CNN-based generator has a larger number of trainable parameters, which makes it hard to train.
  • In Table 2, the detailed properties including the number of parameter and training time per epoch are provided for different learning-to-learn based methods.
  • The authors can observe that the proposed RNN approach stands out with the smallest parameters as well as efficiency in training.
  • Comparison of the variants and original adversarial training methods can be found in Appendix B
  • Results:

    The authors present experimental results of the proposed RNN-based adversarial training.
  • Algorithm 1 RNN-based adversarial training.
  • 1: Input: clean data {(x, y)}, batch size B, step sizes α1 and α2, number of inner iterations T , classifier parameterized by θ, RNN optimizer parameterized by φ 2: Output.
  • Robust classifer fθ, learned optimizer mφ.
  • 3: Randomly initialize fθ and mφ, or initialize them with pre-trained configurations 4: repeat.
  • 5: Sample a mini-batch M from clean data.
  • Conclusion:

    For defense mechanisms that can be formulated as a minimax optimization problem, the authors propose to replace the inner PGD-based maximizer with a automatically learned RNN maximizer, and show that jointly training the RNN maximizer and classifier can significantly improve the defense performance.
  • It can be a worthwhile direction to address the inadequacy of L2L in dealing with a long-horizon problem.
  • The authors can substitute the learned optimizer for hand-designed algorithms in both inner and outer problems, which enables an entirely automatic process for adversarial training
Tables
  • Table1: Effects of the inner solution quality on robust accuracy (%)
  • Table2: Comparion among different L2L-based methods
  • Table3: Robust accuracy under white-box attacks (MNIST, 4-layer CNN)
  • Table4: Robust accuracy under white-box attacks (CIFAR-10, VGG-16)
  • Table5: Robust accuracy under white-box attakcs (CIFAR-10, WideResNet)
  • Table6: Generalization to more
  • Table7: Robust accuracy under black-box atsteps of learned optimizer tack settings
  • Table8: Time comparison with original adversarial training. Here we report the ratio of our proposed method to its original counterpart, for example, TRNN-Adv/TAdvTrain. In the parentheses, we report the training time per epoch of our proposed method including RNN-Adv and RNN-TRADES
  • Table9: Robust accuracy under white-box attacks (Rest. ImageNet, ResNet18)
Download tables as Excel
Related work
  • 2.1 Adversarial Attack and Defense

    Model robustness has recently become a great concern for deploying deep learning models in real-world applications. Goodfellow et al [11] succeeded in fooling the model to make wrong predictions by Fast Gradient Sign Method (FGSM). Subsequently, to produce adversarial examples, IFGSM and Projected Gradient Descent (PGD) [11,21] accumulate attack strength through running FGSM iteratively, and Carlini-Wagner (C&W) attack [4] designs a specific objective function to increase classification errors. Besides these conventional optimization-based methods, there are several algorithms [25,35] focusing on generating malicious perturbations via neural networks. For instance, Xiao et al [35] exploit GAN, which is originally designed for crafting deceptive images, to output corresponding noises added to benign iuput data. The appearance of various attacks has pushed forward the development of effective defense algorithms to train neural networks that are resistant to adversarial examples. The seminal work of adversarial training has significantly improved adversarial robustness [21]. It has inspired the emergence of various advanced defense algorithms: TRADES [39] is designed to minimize a theoretically-driven upper bound and GAT [19] takes generator-based outputs to train the robust classifier. All these methods can be formulated as a minimax problem [21], where the defender makes efforts to mitigate negative effects (outer minimization) brought by adversarial examples from the attacker (inner maximization). Whereas, performance of such an adversarial game is usually constrained by the quality of solutions to the inner problem [13,14]. Intuitively, searching a better maxima for the inner problem can improve the solution of minimax training, leading to improved defensive models.
Funding
  • – Comprehensive experimental results show that the proposed method can noticeably improve the robust accuracy of both adversarial training [21] and TRADES [39]
  • As shown in Table 1, defense with AdvTrain+BLS leads to a more robust model than solving the inner problem only by PGD (88.71% vs 87.33%)
  • In practice we found that considering intermediate iterations can further improve the performance since it will make the maximizer converges faster even after conducting one or few iterations
  • To be specific, our method achieves 95.80% robust accuracy among various attacks on MNIST dataset
  • Our method significantly reduces the loss value of perturbed data close to the original input
Reference
  • Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., De Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in neural information processing systems. pp. 3981–3989 (2016)
    Google ScholarFindings
  • Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning. pp. 274–283 (2018)
    Google ScholarLocate open access versionFindings
  • Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 (2017)
    Findings
  • Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). pp. 39–57. IEEE (2017)
    Google ScholarLocate open access versionFindings
  • Cheng, M., Le, T., Chen, P.Y., Yi, J., Zhang, H., Hsieh, C.J.: Query-efficient hard-label black-box attack: An optimization-based approach. arXiv preprint arXiv:1807.04457 (2018)
    Findings
  • Cheng, M., Singh, S., Chen, P.H., Chen, P.Y., Liu, S., Hsieh, C.J.: Sign-opt: A query-efficient hard-label adversarial attack. In: ICLR (2020)
    Google ScholarFindings
  • Cotter, N.E., Conwell, P.R.: Fixed-weight networks can learn. In: 1990 IJCNN International Joint Conference on Neural Networks. pp. 553–559. IEEE (1990)
    Google ScholarLocate open access versionFindings
  • Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 (2018)
    Findings
  • Engstrom, L., Ilyas, A., Athalye, A.: Evaluating and understanding the robustness of adversarial logit pairing. arXiv preprint arXiv:1807.10272 (2018)
    Findings
  • Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 1126–1135. JMLR. org (2017)
    Google ScholarLocate open access versionFindings
  • Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
    Findings
  • Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. arXiv preprint arXiv:1907.07174 (2019)
    Findings
  • Jang, Y., Zhao, T., Hong, S., Lee, H.: Adversarial defense via learning to generate diverse attacks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2740–2749 (2019)
    Google ScholarLocate open access versionFindings
  • Jiang, H., Chen, Z., Shi, Y., Dai, B., Zhao, T.: Learning to defense by learning to attack. arXiv preprint arXiv:1811.01213 (2018)
    Findings
  • Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (canadian institute for advanced research). URL http://www.cs.toronto.edu/kriz/cifar.html 8 (2010)
    Findings
  • Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
    Findings
  • Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
    Findings
  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
    Google ScholarLocate open access versionFindings
  • Lee, H., Han, S., Lee, J.: Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv preprint arXiv:1705.03387 (2017)
    Findings
  • Lv, K., Jiang, S., Li, J.: Learning gradient descent: Better generalization and longer horizons. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 2247–2255. JMLR. org (2017)
    Google ScholarLocate open access versionFindings
  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
    Findings
  • Metz, L., Maheswaranathan, N., Nixon, J., Freeman, D., Sohl-Dickstein, J.: Understanding and correcting pathologies in the training of learned optimizers. In: International Conference on Machine Learning. pp. 4556–4565 (2019)
    Google ScholarLocate open access versionFindings
  • Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)
    Google ScholarLocate open access versionFindings
  • Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media (2006)
    Google ScholarFindings
  • Reddy Mopuri, K., Ojha, U., Garg, U., Venkatesh Babu, R.: Nag: Network for adversary generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 742–751 (2018)
    Google ScholarLocate open access versionFindings
  • Ruan, Y., Xiong, Y., Reddi, S., Kumar, S., Hsieh, C.J.: Learning to learn by zerothorder oracle. arXiv preprint arXiv:1910.09464 (2019)
    Findings
  • Samangouei, P., Kabkab, M., Chellappa, R.: Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605 (2018)
    Findings
  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
    Findings
  • Sinha, A., Singh, M., Kumari, N., Krishnamurthy, B., Machiraju, H., Balasubramanian, V.N.: Harnessing the vulnerability of latent layers in adversarially trained models. arXiv preprint arXiv:1905.05186 (2019)
    Findings
  • Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018)
    Findings
  • Wang, H., Yu, C.N.: A direct approach to robust deep learning using adversarial networks. arXiv preprint arXiv:1905.09591 (2019)
    Findings
  • Wang, J., Zhang, H.: Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 6629–6638 (2019)
    Google ScholarLocate open access versionFindings
  • Wichrowska, O., Maheswaranathan, N., Hoffman, M.W., Colmenarejo, S.G., Denil, M., de Freitas, N., Sohl-Dickstein, J.: Learned optimizers that scale and generalize. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 3751–3760. JMLR. org (2017)
    Google ScholarLocate open access versionFindings
  • Wu, Y., Ren, M., Liao, R., Grosse, R.: Understanding short-horizon bias in stochastic meta-optimization. arXiv preprint arXiv:1803.02021 (2018)
    Findings
  • Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. pp. 3905–3911 (2018)
    Google ScholarLocate open access versionFindings
  • Xie, C., Wu, Y., Maaten, L.v.d., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 501–509 (2019)
    Google ScholarLocate open access versionFindings
  • Younger, A.S., Hochreiter, S., Conwell, P.R.: Meta-learning with backpropagation. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222). vol.
    Google ScholarLocate open access versionFindings
  • Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
    Findings
  • Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573 (2019)
    Findings
Your rating :
0

 

Tags
Comments