Detecting Word Sense Disambiguation Biases in Machine Translation for Model Agnostic Adversarial Attacks
EMNLP 2020, 2020.
Keywords:
simple adversarial attacktraining datumdisambiguation errordisambiguation biasadversarial attackMore(13+)
Weibo:
Abstract:
Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models’ over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method f...More
Code:
Data:
Introduction
- Consider the sentence John met his wife in the hot spring of 1988. In this context, the polysemous term spring unambiguously refers to the season of a specific year.
- Prior studies have indicated that neural machine translation (NMT) models rely heavily on source sentence information when resolving lexical ambiguity (Tang et al, 2019).
- This suggests that the combined source contexts in which a specific sense of an ambiguous term occurs in the training data
Highlights
- Consider the sentence John met his wife in the hot spring of 1988
- We propose that our motivating example is representative of a systematic pathology neural machine translation (NMT) systems have yet to overcome when performing word sense disambiguation (WSD)
- We conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights
- Our results show that WSD is not yet a solved problem in NMT, and while the general performance of popular model architectures is high, we can identify or create sentences where models are more likely to fail due to data biases
- The presented approach is expected to be transferable to other language pairs and translation directions, assuming that the employed translation models share this underlying weakness
- As a continuation to this work, we intend to evaluate whether multilingual translation models are more resilient to lexical disambiguation biases and, as a consequence, are less susceptible to adversarial attacks that exploit source-side homography
Results
- Significant correlations are discovered for all bias estimates based on attractors (p < 1e-5, two-sided).
- While WSD performance is up to 96% on randomly chosen sentences, performance drops to 77–82% for the best-performing model (Transformer).
- Samples are rejected if their perplexity exceeds that of their corresponding seed sentence by more than 20%.
- To the findings reported in section 2.2, all uncovered correlations are strong and statistically significant with p < 1e-5
Conclusion
- The authors conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights.
- Extending model-agnostic attack strategies to incorporate other types of dataset biases and to target natural language processing tasks other than machine translation is likewise a promising avenue for future research.
- The targeted development of models that are resistant to dataset artifacts is a promising direction that is likely to aid generalization across linguistically diverse domains
Summary
Introduction:
Consider the sentence John met his wife in the hot spring of 1988. In this context, the polysemous term spring unambiguously refers to the season of a specific year.- Prior studies have indicated that neural machine translation (NMT) models rely heavily on source sentence information when resolving lexical ambiguity (Tang et al, 2019).
- This suggests that the combined source contexts in which a specific sense of an ambiguous term occurs in the training data
Results:
Significant correlations are discovered for all bias estimates based on attractors (p < 1e-5, two-sided).- While WSD performance is up to 96% on randomly chosen sentences, performance drops to 77–82% for the best-performing model (Transformer).
- Samples are rejected if their perplexity exceeds that of their corresponding seed sentence by more than 20%.
- To the findings reported in section 2.2, all uncovered correlations are strong and statistically significant with p < 1e-5
Conclusion:
The authors conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights.- Extending model-agnostic attack strategies to incorporate other types of dataset biases and to target natural language processing tasks other than machine translation is likewise a promising avenue for future research.
- The targeted development of models that are resistant to dataset artifacts is a promising direction that is likely to aid generalization across linguistically diverse domains
Tables
- Table1: Examples of attractors for spring
- Table2: EN-DE translation performance (BLEU)
- Table3: Rank biserial correlation between disambiguation bias measures and lexical disambiguation errors
- Table4: Perturbation examples; seed sense: season, adversarial sense: water source. Insertion/replacement in red
- Table5: Rank biserial correlation between attractors’ disambiguation bias and attack success
- Table6: Examples of successful attacks on the OS18 transformer. Homographs are blue, attractors are red
- Table7: Base-rate adjusted thresholds for the interpretation of WSD error prediction correlations
- Table8: Base-rate adjusted thresholds for the interpretation of attack success correlations
- Table9: Corpus statistics for the OS18 domain
- Table10: Corpus statistics for the WMT19 domain
- Table11: Non-exhaustive examples of homograph-specific sense clusters
- Table12: Training settings and model hyperparameters
- Table13: Additional examples of successful attacks on the OS18 transformer. Homographs are blue, attractors are red
- Table14: Examples of successful attacks on the OS18 LSTM. Homographs are blue, attractors are red
- Table15: Examples of successful attacks on the OS18 ConvS2S. Homographs are blue, attractors are red
- Table16: Examples of successful attacks on the WMT19 transformer. Homographs are blue, attractors are red
- Table17: Examples of successful attacks on the WMT19 LSTM. Homographs are blue, attractors are red
- Table18: Examples of successful attacks on the WMT19 ConvS2S. Homographs are blue, attractors are red
Funding
- Rico Sennrich has received funding from the Swiss National Science Foundation (project MUTAMUR; no. 176727)
Study subjects and analysis
test pairs with the highest FREQDIFF score is subsampled: 3000
Challenge set evaluation. To establish the predictive power of the uncovered correlations, a challenge set of 3000 test pairs with the highest FREQDIFF score is subsampled from the full WSD test pair pool in both domains. In addition, we create secondary sets of equal size by randomly selecting pairs from each pool
pairs: 3000
NMT models are known to underperform on low-frequency senses of ambiguous terms (Rios et al, 2017), prompting us to investigate if disambiguation biases capture the same information. For this purpose, another challenge set of 3000 pairs is constructed by prioritizing pairs assigned to the rarest among each homograph’s sense sets. We find that the new challenge set has a 72.63% overlap with the disambiguation bias challenge set in the OS18 domain and 64.4% overlap in the WMT19 domain
adversarial samples with the highest attractor FREQDIFF scores: 10000
For this purpose, we examine the percentage of attack successes per perturbation strategy in Figure 2, finding perturbations proximate to the homograph to be most effective. Having thus identified a strategy for selecting attractors that are likely to yield successful attacks, we construct a challenge set of 10000 adversarial samples with the highest attractor FREQDIFF scores that had been obtained via the IH or RH perturbations. To enforce sample diversity, we limit the number of samples to at most 1000 per homograph
samples: 1000
In the OS18 domain, only 1.04% of samples are less grammatical than their respective seed sentences, whereas this is the case for 2.04% of WMT19 samples, indicating a minimal degradation. We additionally present two bilingual judges with 1000 samples picked at random from adversarial challenge sets in both domains and 1000 regular sentences from challenge sets constructed in section 2.2. For each adversarial source sen-
Reference
- Moustafa Alzantot, Yash Sharma Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
- Loıc Barrault, Ondrej Bojar, Marta R Costa-Jussa, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, et al. 2019. Findings of the 2019 conference on machine translation (wmt19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61.
- Yonatan Belinkov, Adam Poliak, Stuart Shieber, Benjamin Van Durme, and Alexander Sasha Rush. 2019. On adversarial removal of hypothesis-only bias in natural language inference. In Proceedings of the Joint Conference on Lexical and Computational Semantics.
- Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Chen, and Cho-Jui Hsieh. 2018. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. arXiv preprint arXiv:1803.01128.
- Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust neural machine translation with doubly adversarial inputs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4324–4333.
- Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Academic press.
- Edward E Cureton. 1956. Rank-biserial correlation. Psychometrika, 21(3):287–290.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
- Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644–648, Atlanta, Georgia. Association for Computational Linguistics.
- Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 123–135.
- Max Glockner, Vered Shwartz, and Yoav Goldberg. 2018. Breaking nli systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–655.
- Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A Smith. 2018. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112.
- Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear, 7(1).
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pages 177–180.
- Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E Peters, Ashish Sabharwal, and Yejin Choi. 2020. Adversarial filters of dataset biases. arXiv, pages arXiv–2002.
- Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9572–9581.
- Pierre Lison, Jorg Tiedemann, Milen Kouylekov, et al. 2019. Open subtitles 2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora. In LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
- Frederick Liu, Han Lu, and Graham Neubig. 20Handling homographs in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1336–1345.
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421.
- Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50–60.
- Rebecca Marvin and Philipp Koehn. 2018. Exploring word sense disambiguation abilities of neural machine translation systems (non-archival extended abstract). In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), pages 125–131.
- Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448.
- Robert E McGrath and Gregory J Meyer. 2006. When effect sizes disagree: the case of r and d. Psychological methods, 11(4):386.
- Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On evaluation of adversarial perturbations for sequence-to-sequence models. In Proceedings of NAACL-HLT, pages 3103–3114.
- John X Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. 2020. Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174.
- Roberto Navigli and Simone Paolo Ponzetto. 2010. Babelnet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 216–225. Association for Computational Linguistics.
- Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.
- Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186– 191.
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
- Alessandro Raganato, Yves Scherrer, and Jorg Tiedemann. 2019. The mucow test suite at wmt 2019: Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 470–480.
- Annette Rios, Laura Mascarell, and Rico Sennrich. 2017. Improving word sense disambiguation in neural machine translation with sense embeddings. In Proceedings of the Second Conference on Machine Translation, pages 11–19.
- John Ruscio. 2008. A probability-based measure of effect size: Robustness to base rates and other factors. Psychological methods, 13(1):19.
- Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725.
- Gabriel Stanovsky, Noah A Smith, and Luke Zettlemoyer. 2019. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684.
- Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2019. Encoders help you disambiguate word senses in neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1429–1435.
- Raphael Vallat. 2018. Pingouin: statistics in python. The Journal of Open Source Software, 3(31):1026.
- Eva Vanmassenhove, Christian Hardmeier, and Andy Way. 2018. Getting gender right in neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3003–3008.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. 2019. Generating fluent adversarial examples for natural languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5564–5569.
- Wei Emma Zhang, Quan Z Sheng, Ahoud Alhazmi, and Chenliang Li. 2020. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–41.
- Each dataset is subsequently tokenized and truecased using Moses (Koehn et al., 2007) scripts17. For model training and evaluation, we additionally learn and apply BPE codes (Sennrich et al., 2016) to the data using the subword-NMT implementation18, with 32k merge operations and the vocabulary threshold set to 50.
Tags
Comments