Detecting Word Sense Disambiguation Biases in Machine Translation for Model Agnostic Adversarial Attacks

Denis Emelin
Denis Emelin

EMNLP 2020, 2020.

Cited by: 0|Bibtex|Views15|Links
Keywords:
simple adversarial attacktraining datumdisambiguation errordisambiguation biasadversarial attackMore(13+)
Weibo:
We conducted an initial investigation into leveraging data artifacts for the prediction of word sense disambiguation errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights

Abstract:

Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models’ over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method f...More

Code:

Data:

0
Introduction
  • Consider the sentence John met his wife in the hot spring of 1988. In this context, the polysemous term spring unambiguously refers to the season of a specific year.
  • Prior studies have indicated that neural machine translation (NMT) models rely heavily on source sentence information when resolving lexical ambiguity (Tang et al, 2019).
  • This suggests that the combined source contexts in which a specific sense of an ambiguous term occurs in the training data
Highlights
  • Consider the sentence John met his wife in the hot spring of 1988
  • We propose that our motivating example is representative of a systematic pathology neural machine translation (NMT) systems have yet to overcome when performing word sense disambiguation (WSD)
  • We conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights
  • Our results show that WSD is not yet a solved problem in NMT, and while the general performance of popular model architectures is high, we can identify or create sentences where models are more likely to fail due to data biases
  • The presented approach is expected to be transferable to other language pairs and translation directions, assuming that the employed translation models share this underlying weakness
  • As a continuation to this work, we intend to evaluate whether multilingual translation models are more resilient to lexical disambiguation biases and, as a consequence, are less susceptible to adversarial attacks that exploit source-side homography
Results
  • Significant correlations are discovered for all bias estimates based on attractors (p < 1e-5, two-sided).
  • While WSD performance is up to 96% on randomly chosen sentences, performance drops to 77–82% for the best-performing model (Transformer).
  • Samples are rejected if their perplexity exceeds that of their corresponding seed sentence by more than 20%.
  • To the findings reported in section 2.2, all uncovered correlations are strong and statistically significant with p < 1e-5
Conclusion
  • The authors conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights.
  • Extending model-agnostic attack strategies to incorporate other types of dataset biases and to target natural language processing tasks other than machine translation is likewise a promising avenue for future research.
  • The targeted development of models that are resistant to dataset artifacts is a promising direction that is likely to aid generalization across linguistically diverse domains
Summary
  • Introduction:

    Consider the sentence John met his wife in the hot spring of 1988. In this context, the polysemous term spring unambiguously refers to the season of a specific year.
  • Prior studies have indicated that neural machine translation (NMT) models rely heavily on source sentence information when resolving lexical ambiguity (Tang et al, 2019).
  • This suggests that the combined source contexts in which a specific sense of an ambiguous term occurs in the training data
  • Results:

    Significant correlations are discovered for all bias estimates based on attractors (p < 1e-5, two-sided).
  • While WSD performance is up to 96% on randomly chosen sentences, performance drops to 77–82% for the best-performing model (Transformer).
  • Samples are rejected if their perplexity exceeds that of their corresponding seed sentence by more than 20%.
  • To the findings reported in section 2.2, all uncovered correlations are strong and statistically significant with p < 1e-5
  • Conclusion:

    The authors conducted an initial investigation into leveraging data artifacts for the prediction of WSD errors in machine translation and proposed a simple adversarial attack strategy based on the presented insights.
  • Extending model-agnostic attack strategies to incorporate other types of dataset biases and to target natural language processing tasks other than machine translation is likewise a promising avenue for future research.
  • The targeted development of models that are resistant to dataset artifacts is a promising direction that is likely to aid generalization across linguistically diverse domains
Tables
  • Table1: Examples of attractors for spring
  • Table2: EN-DE translation performance (BLEU)
  • Table3: Rank biserial correlation between disambiguation bias measures and lexical disambiguation errors
  • Table4: Perturbation examples; seed sense: season, adversarial sense: water source. Insertion/replacement in red
  • Table5: Rank biserial correlation between attractors’ disambiguation bias and attack success
  • Table6: Examples of successful attacks on the OS18 transformer. Homographs are blue, attractors are red
  • Table7: Base-rate adjusted thresholds for the interpretation of WSD error prediction correlations
  • Table8: Base-rate adjusted thresholds for the interpretation of attack success correlations
  • Table9: Corpus statistics for the OS18 domain
  • Table10: Corpus statistics for the WMT19 domain
  • Table11: Non-exhaustive examples of homograph-specific sense clusters
  • Table12: Training settings and model hyperparameters
  • Table13: Additional examples of successful attacks on the OS18 transformer. Homographs are blue, attractors are red
  • Table14: Examples of successful attacks on the OS18 LSTM. Homographs are blue, attractors are red
  • Table15: Examples of successful attacks on the OS18 ConvS2S. Homographs are blue, attractors are red
  • Table16: Examples of successful attacks on the WMT19 transformer. Homographs are blue, attractors are red
  • Table17: Examples of successful attacks on the WMT19 LSTM. Homographs are blue, attractors are red
  • Table18: Examples of successful attacks on the WMT19 ConvS2S. Homographs are blue, attractors are red
Download tables as Excel
Funding
  • Rico Sennrich has received funding from the Swiss National Science Foundation (project MUTAMUR; no. 176727)
Study subjects and analysis
test pairs with the highest FREQDIFF score is subsampled: 3000
Challenge set evaluation. To establish the predictive power of the uncovered correlations, a challenge set of 3000 test pairs with the highest FREQDIFF score is subsampled from the full WSD test pair pool in both domains. In addition, we create secondary sets of equal size by randomly selecting pairs from each pool

pairs: 3000
NMT models are known to underperform on low-frequency senses of ambiguous terms (Rios et al, 2017), prompting us to investigate if disambiguation biases capture the same information. For this purpose, another challenge set of 3000 pairs is constructed by prioritizing pairs assigned to the rarest among each homograph’s sense sets. We find that the new challenge set has a 72.63% overlap with the disambiguation bias challenge set in the OS18 domain and 64.4% overlap in the WMT19 domain

adversarial samples with the highest attractor FREQDIFF scores: 10000
For this purpose, we examine the percentage of attack successes per perturbation strategy in Figure 2, finding perturbations proximate to the homograph to be most effective. Having thus identified a strategy for selecting attractors that are likely to yield successful attacks, we construct a challenge set of 10000 adversarial samples with the highest attractor FREQDIFF scores that had been obtained via the IH or RH perturbations. To enforce sample diversity, we limit the number of samples to at most 1000 per homograph

samples: 1000
In the OS18 domain, only 1.04% of samples are less grammatical than their respective seed sentences, whereas this is the case for 2.04% of WMT19 samples, indicating a minimal degradation. We additionally present two bilingual judges with 1000 samples picked at random from adversarial challenge sets in both domains and 1000 regular sentences from challenge sets constructed in section 2.2. For each adversarial source sen-

Reference
  • Moustafa Alzantot, Yash Sharma Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Loıc Barrault, Ondrej Bojar, Marta R Costa-Jussa, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, et al. 2019. Findings of the 2019 conference on machine translation (wmt19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov, Adam Poliak, Stuart Shieber, Benjamin Van Durme, and Alexander Sasha Rush. 2019. On adversarial removal of hypothesis-only bias in natural language inference. In Proceedings of the Joint Conference on Lexical and Computational Semantics.
    Google ScholarLocate open access versionFindings
  • Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Chen, and Cho-Jui Hsieh. 2018. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. arXiv preprint arXiv:1803.01128.
    Findings
  • Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust neural machine translation with doubly adversarial inputs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4324–4333.
    Google ScholarLocate open access versionFindings
  • Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Academic press.
    Google ScholarFindings
  • Edward E Cureton. 1956. Rank-biserial correlation. Psychometrika, 21(3):287–290.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644–648, Atlanta, Georgia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 123–135.
    Google ScholarLocate open access versionFindings
  • Max Glockner, Vered Shwartz, and Yoav Goldberg. 2018. Breaking nli systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–655.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A Smith. 2018. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112.
    Google ScholarLocate open access versionFindings
  • Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear, 7(1).
    Google ScholarFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pages 177–180.
    Google ScholarLocate open access versionFindings
  • Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E Peters, Ashish Sabharwal, and Yejin Choi. 2020. Adversarial filters of dataset biases. arXiv, pages arXiv–2002.
    Google ScholarFindings
  • Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9572–9581.
    Google ScholarLocate open access versionFindings
  • Pierre Lison, Jorg Tiedemann, Milen Kouylekov, et al. 2019. Open subtitles 2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora. In LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Frederick Liu, Han Lu, and Graham Neubig. 20Handling homographs in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1336–1345.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421.
    Google ScholarLocate open access versionFindings
  • Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50–60.
    Google ScholarFindings
  • Rebecca Marvin and Philipp Koehn. 2018. Exploring word sense disambiguation abilities of neural machine translation systems (non-archival extended abstract). In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), pages 125–131.
    Google ScholarLocate open access versionFindings
  • Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448.
    Google ScholarLocate open access versionFindings
  • Robert E McGrath and Gregory J Meyer. 2006. When effect sizes disagree: the case of r and d. Psychological methods, 11(4):386.
    Google ScholarLocate open access versionFindings
  • Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On evaluation of adversarial perturbations for sequence-to-sequence models. In Proceedings of NAACL-HLT, pages 3103–3114.
    Google ScholarLocate open access versionFindings
  • John X Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. 2020. Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174.
    Findings
  • Roberto Navigli and Simone Paolo Ponzetto. 2010. Babelnet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 216–225. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.
    Google ScholarLocate open access versionFindings
  • Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186– 191.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
    Google ScholarFindings
  • Alessandro Raganato, Yves Scherrer, and Jorg Tiedemann. 2019. The mucow test suite at wmt 2019: Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 470–480.
    Google ScholarLocate open access versionFindings
  • Annette Rios, Laura Mascarell, and Rico Sennrich. 2017. Improving word sense disambiguation in neural machine translation with sense embeddings. In Proceedings of the Second Conference on Machine Translation, pages 11–19.
    Google ScholarLocate open access versionFindings
  • John Ruscio. 2008. A probability-based measure of effect size: Robustness to base rates and other factors. Psychological methods, 13(1):19.
    Google ScholarLocate open access versionFindings
  • Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812.
    Findings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725.
    Google ScholarLocate open access versionFindings
  • Gabriel Stanovsky, Noah A Smith, and Luke Zettlemoyer. 2019. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684.
    Google ScholarLocate open access versionFindings
  • Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2019. Encoders help you disambiguate word senses in neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1429–1435.
    Google ScholarLocate open access versionFindings
  • Raphael Vallat. 2018. Pingouin: statistics in python. The Journal of Open Source Software, 3(31):1026.
    Google ScholarLocate open access versionFindings
  • Eva Vanmassenhove, Christian Hardmeier, and Andy Way. 2018. Getting gender right in neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3003–3008.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
    Findings
  • Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. 2019. Generating fluent adversarial examples for natural languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5564–5569.
    Google ScholarLocate open access versionFindings
  • Wei Emma Zhang, Quan Z Sheng, Ahoud Alhazmi, and Chenliang Li. 2020. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–41.
    Google ScholarLocate open access versionFindings
  • Each dataset is subsequently tokenized and truecased using Moses (Koehn et al., 2007) scripts17. For model training and evaluation, we additionally learn and apply BPE codes (Sennrich et al., 2016) to the data using the subword-NMT implementation18, with 32k merge operations and the vocabulary threshold set to 50.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments