Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

ACL, pp. 6600-6610, 2020.

Cited by: 1|Bibtex|Views54|Links
EI
Keywords:
adversarial perturbationnn nns vbd rbnns cc nn nns vbdPenn Treebankadversarial attackMore(12+)
Weibo:
We study the robustness of neural network-based dependency parsing models

Abstract:

Despite achieving prominent performance on many important tasks, it has been reported that neural networks are vulnerable to adversarial examples. Previously studies along this line mainly focused on semantic tasks such as sentiment analysis, question answering and reading comprehension. In this study, we show that adversarial examples al...More
0
Introduction
  • Deep neural network-based machine learning (ML) models are powerful but vulnerable to adversarial examples.
  • Even though generating adversarial examples for texts has proven to be a more challenging task det prep nsubj pobj det conj cc nn punct advmod.
  • The link between the futures and stock markets ripped apart.
  • Nsubj pobj det nn conj cc.
  • The link between the futures and exchange markets ripped apart.
Highlights
  • Deep neural network-based machine learning (ML) models are powerful but vulnerable to adversarial examples
  • The introduction of the adversarial example and training ushered in a new era to understand and improve the machine learning models, and has received significant attention recently (Szegedy et al, 2013; Goodfellow et al, 2015; Moosavi-Dezfooli et al, 2016; Papernot et al, 2016b; Carlini and Wagner, 2017; Yuan et al, 2019; Eykholt et al, 2018; Xu et al, 2019)
  • Our contributions are summarized as follows: (1) we explore the feasibility of generating syntactic adversarial sentence examples to cause a dependency parser to make mistakes without altering the original syntactic structures; (2) we propose two approaches to construct the syntactic adversarial examples by searching over perturbations to existing texts at sentence and phrase levels in both the blackbox and white-box settings; (3) our experiments with a close to state-of-the-art parser on the English Penn Treebank show that up to 77% of input examples admit adversarial perturbations, and that robustness and generalization of parsing models can be improved by adversarial training with the proposed attacks
  • Its performance drops 15.17% in unlabeled attachment score, and 77% sentence examples admit the adversarial perturbations under the white-box attack with 15% word replacement
  • We study the robustness of neural network-based dependency parsing models
  • We develop the first adversarial attack algorithms for this task to successfully find the blind spots of parsers with high success rates
Methods
  • Adversarial examples are required to maintain the original functionality of the input.
  • To expose regions of the input space where the dependency parsers perform poorly, the authors would like the modified examples x to preserve the same syntactic structure as the original x, but slightly relax the constraint on their similarity in semantic properties.
  • A robust parser should perform consistently well on the sentences that share the same syntactic properties, while differ in their meaning.
  • Substituting the word “black” for “white”, or “dog” for “cat” are acceptable replacements because they are grammatically imperceptible to humans
Results
  • Results of the

    Sentence-level Attacks

    The authors report the empirical studies of adversarial attacks for sentence-level methods.
  • Its performance drops 15.17% in UAS, and 77% sentence examples admit the adversarial perturbations under the white-box attack with 15% word replacement.
  • The white-box attacks are clearly much more effective than the black-box ones across the three variants of the parsing model and different word replacement rates.
  • For each source-target pair, the authors allow to modify the source subtree up to 3 words
  • For some sentences, their adversarial examples can be generated by replacing just one or two words
Conclusion
  • The authors study the robustness of neural network-based dependency parsing models.
  • To the best of the knowledge, adversarial examples to syntactic tasks, such as dependency parsing, have not been explored in the literature.
  • The authors develop the first adversarial attack algorithms for this task to successfully find the blind spots of parsers with high success rates.
  • By applying adversarial training using the proposed attacks, the authors are able to significantly improve the robustness of dependency parsers without sacrificing their performance on clean data
Summary
  • Introduction:

    Deep neural network-based machine learning (ML) models are powerful but vulnerable to adversarial examples.
  • Even though generating adversarial examples for texts has proven to be a more challenging task det prep nsubj pobj det conj cc nn punct advmod.
  • The link between the futures and stock markets ripped apart.
  • Nsubj pobj det nn conj cc.
  • The link between the futures and exchange markets ripped apart.
  • Methods:

    Adversarial examples are required to maintain the original functionality of the input.
  • To expose regions of the input space where the dependency parsers perform poorly, the authors would like the modified examples x to preserve the same syntactic structure as the original x, but slightly relax the constraint on their similarity in semantic properties.
  • A robust parser should perform consistently well on the sentences that share the same syntactic properties, while differ in their meaning.
  • Substituting the word “black” for “white”, or “dog” for “cat” are acceptable replacements because they are grammatically imperceptible to humans
  • Results:

    Results of the

    Sentence-level Attacks

    The authors report the empirical studies of adversarial attacks for sentence-level methods.
  • Its performance drops 15.17% in UAS, and 77% sentence examples admit the adversarial perturbations under the white-box attack with 15% word replacement.
  • The white-box attacks are clearly much more effective than the black-box ones across the three variants of the parsing model and different word replacement rates.
  • For each source-target pair, the authors allow to modify the source subtree up to 3 words
  • For some sentences, their adversarial examples can be generated by replacing just one or two words
  • Conclusion:

    The authors study the robustness of neural network-based dependency parsing models.
  • To the best of the knowledge, adversarial examples to syntactic tasks, such as dependency parsing, have not been explored in the literature.
  • The authors develop the first adversarial attack algorithms for this task to successfully find the blind spots of parsers with high success rates.
  • By applying adversarial training using the proposed attacks, the authors are able to significantly improve the robustness of dependency parsers without sacrificing their performance on clean data
Tables
  • Table1: Results of sentence-level adversarial attacks on a state-of-the-art parser with the English Penn Treebank in both the black-box and white-box settings. “Word-based”, “Word + POS”, and “Character-based” denote three variants of the model (<a class="ref-link" id="cDozat_2017_a" href="#rDozat_2017_a">Dozat and Manning, 2017</a>) with differences in their input forms. “Max%” denotes the maximum percentage of words that are allowed to be modified, “UAS” unlabeled attachment scores, “#Word” the average number of words actually modified, and “Succ%” the success rate in terms of the number of sentences
  • Table2: Performance of adversarial training. “Clean” stands for the testing results on the clean data, and “Attack [b]” and “Attack [w]” respectively denote the accuracy under test-time attacks in the black-box ([b]) and white-box ([w]) settings. “Original” and “Adv” denotes the testing and adversarial accuracy of the models without and with the adversarial training
  • Table3: The attack success rate and the corresponding changes in UAS by modifying the words with different part-of-speech. “JJ” denotes adjective, “NN” noun, “RB” adverb, “VB” verb, and “IN” preposition
  • Table4: The success rate of the phrase-level attacks
Download tables as Excel
Related work
  • Generating adversarial examples – inputs intentionally crafted to fool a model – has become an important means of exploring model vulnerabilities. Furthermore, adding adversarial examples in the training stage, also known as adversarial training, has become one of the most promising ways to improve model’s robustness. Although there is limited literature available for NLP adversarial examples, some studies have been conducted on NLP tasks such as reading comprehension (Jia and Liang, 2017), text classification (Samanta and Mehta, 2017; Wong, 2017; Liang et al, 2018; Alzantot et al, 2018), machine translation (Zhao et al, 2018; Ebrahimi et al, 2018; Cheng et al, 2018), and dialogue systems (Cheng et al, 2019).

    Depending on the degree of access to the target model, adversarial examples can be constructed two different settings: white-box and black-box settings (Xu et al, 2019; Wang et al, 2019). In the white-box setting, an adversary can access the model’s architecture, parameters and input feature representations while not in the black-box one. The white-box attacks normally yield a higher success rate because the knowledge of target models can be used to guide the generation of adversarial examples. However, the black-box attacks do not require access to target models, making them more practicable for many real-world attacks. Such attacks also can be divided into targeted and non-targeted ones depending on the purpose of adversary. Our phrase-level attack can be viewed as a targeted attack towards a specific subtree while the sentencelevel attack can be taken as a non-targeted one.
Funding
  • This work was supported by National Key R&D Program of China (No 2018YFC0830902), Shanghai Municipal Science and Technology Major Project (No 2018SHZDZX01) and Zhangjiang Lab
Reference
  • Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Samuel Barham and Soheil Feizi. 2019. Interpretable adversarial training for text. Computing Research Repository, arXiv: 1905.12864.
    Findings
  • Sabine Buchholz and Erwin Marsi. 2006. CoNLLX shared task on multilingual dependency parsing. In Proceedings of the International Conference on Computational Natural Language Learning.
    Google ScholarLocate open access versionFindings
  • Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy.
    Google ScholarLocate open access versionFindings
  • Minhao Cheng, Wei Wei, and Cho-Jui Hsieh. 2019. Evaluating and enhancing the robustness of dialogue systems: A case study on a negotiation agent. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Chen, and Cho-Jui Hsieh. 2018. Seq2Sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. Computing Research Repository, arXiv: 1803.01128.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Timothy Dozat and Christopher D. Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-box adversarial examples for text classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Steffen Eger, Gozde Gul Sahin, Andreas Ruckle, JiUng Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, and Iryna Gurevych. 2019. Text processing like humans do: Visually attacking and shielding NLP systems. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust physical-world attacks on deep learning models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
  • Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. Computing Research Repository, arXiv: 1801.04354.
    Findings
  • Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Homa B. Hashemi and Rebecca Hwa. 2016. An evaluation of parser robustness for ungrammatical sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, and Cho-Jui Hsieh. 2019. On the robustness of self-attentive models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2019. Is BERT really robust? a strong baseline for natural language attack on text classification and entailment. Computing Research Repository, arXiv: 1907.11932.
    Findings
  • Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association for Computational Linguistics, 4:313–327.
    Google ScholarLocate open access versionFindings
  • Volodymyr Kuleshov, Shantanu Thakoor, Tingfung Lau, and Stefano Ermon. 2018. Adversarial examples for natural language classification problems. OpenReview Submission, id: r1QZ3zbAZ.
    Google ScholarLocate open access versionFindings
  • Qi Lei, Lingfei Wu, Pin-Yu Chen, Alexandros G. Dimakis, Inderjit S. Dhillon, and Michael Witbrock. 2019. Discrete adversarial attacks and submodular optimization with applications to text classification. In Proceedings of the Conference on Systems and Machine Learning.
    Google ScholarLocate open access versionFindings
  • Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2018. Deep text classification can be fooled. In Proceedings of the International Joint Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation.
    Google ScholarLocate open access versionFindings
  • Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. DeepFool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
  • Dominick Ng and James R. Curran. 2015. Identifying cascading errors using constraints in dependency parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016a. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy.
    Google ScholarLocate open access versionFindings
  • Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016b. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically equivalent adversarial rules for debugging NLP models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. Computing Research Repository, arXiv: 1707.02812.
    Findings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. Computing Research Repository, arXiv: 1312.6199.
    Findings
  • Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, and Aoshuang Ye. 2019. A survey: Towards a robust deep neural network in text domain. Computing Research Repository, arXiv: 1902.07285.
    Findings
  • Catherine Wong. 2017. DANCin SEQ2SEQ: Fooling text classifiers with adversarial text example generation. Computing Research Repository, arXiv: 1712.05419.
    Findings
  • Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, and Anil K. Jain. 2019. Adversarial attacks and defenses in images, graphs and text: A review. Computing Research Repository, arXiv: 1909.08072.
    Findings
  • Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, and Michael I. Jordan. 2018. Greedy attack and gumbel attack: Generating adversarial examples for discrete data. Computing Research Repository, arXiv: 1805.12316.
    Findings
  • Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 30(9):2805–2824.
    Google ScholarLocate open access versionFindings
  • Yuan Zang, Chenghao Yang, Fanchao Qi, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2019. Textual adversarial attack as combinatorial optimization. Computing Research Repository, arXiv: 1910.12196.
    Findings
  • Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments