AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We have presented an approach to augment the training data of Neural Machine Translation models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
ACL, (2020): 5961-5970
In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, of which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embed...More
PPT (Upload PPT)
- Recent work in neural machine translation (Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017) has led to dramatic improvements in both research and commercial systems (Wu et al, 2016).
- Two types of noise can be distinguished: (1) continuous noise which is modeled as a realvalued vector applied to word embeddings (Miyato et al, 2016, 2017; Cheng et al, 2018; Sano et al, 2019), and (2) discrete noise which adds, deletes, and/or replaces characters or words in the observed sentences (Belinkov and Bisk, 2018; Sperber et al, 2017; Ebrahimi et al, 2018; Michel et al, 2019; Cheng et al, 2019; Karpukhin et al, 2019)
- In both cases, the challenge is to ensure that the noisy examples are still semantically valid translation pairs.
- Recent work in neural machine translation (Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017) has led to dramatic improvements in both research and commercial systems (Wu et al, 2016)
- While constructing semantics-preserving continuous noise in a high-dimensional space proves to be non-trivial, state-of-the-art Neural Machine Translation models are currently based on adversarial examples of discrete noise
- We find that the generated adversarial sentences are unnatural, and, as we will show, suboptimal for learning robust Neural Machine Translation models
- The decoder in the Neural Machine Translation model acts as a conditional language model that operates on a shifted copy of y, i.e., sos, y0, ..., y|y|−1 where sos is a start symbol of a sentence and representations of x learned by the encoder
- We introduce a new method to augment the representations of the adversarial examples in sequence-tosequence training of the Neural Machine Translation model
- We have presented an approach to augment the training data of Neural Machine Translation models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples
- 2. Following Miyato et al (2017), the authors use adversarial learning to add continuous gradient-based perturbations to source word embeddings and extend it to the Transformer model.
- 3. Sano et al (2019) leverage Miyato et al (2017)’s idea into NMT by incorporating gradient-based perturbations to both source and target word embeddings and optimize the model with adversarial training.
- Adversarial examples are used to both attack and defend the NMT model
- Chinese-English Translation.
- Table 1 shows results on the Chinese-English translation task, in comparison with the following six baseline methods.
- The authors implement all these
- The authors have presented an approach to augment the training data of NMT models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples.
- To further improve the translation quality, the authors incorporate an existing vicinity distribution, similar to mixup for observed examples in the training set.
- The authors design an augmentation algorithm over the virtual sentences sampled from both of the vicinity distributions in sequence-to-sequence NMT model training.
- Experimental results on Chinese-English, English-French and English-German translation tasks demonstrate the capability of the approach to improving both translation performance and robustness
- Table1: Baseline comparison on NIST Chinese-English translation. * indicates the model uses extra corpora and means not elaborating on its training loss
- Table2: Results on IWSLT16 English-French and WMT14 English-German translation
- Table3: Translation Examples of Transformer and our model for an input and its adversarial input
- Table4: Effect of α on the Chinese-English validation set. “-” indicates that the model fails to converge
- Table5: Results on artificial noisy inputs. The column lists results for different noise fractions
- Data Augmentation. Data augmentation is an effective method to improve machine translation performance. Existing methods in NMT may be divided into two categories, based upon extra corpora (Sennrich et al, 2016a; Cheng et al, 2016; Zhang and Zong, 2016; Edunov et al, 2018) or original parallel corpora (Fadaee et al, 2017; Wang et al, 2018; Cheng et al, 2019). Recently, mixup (Zhang et al, 2018) has become a popular data augmentation technique for semi-supervised learning (Berthelot et al, 2019) and overcoming real-world noisy data (Jiang et al, 2019). Unlike prior works, we introduce a new method to augment the representations of the adversarial examples in sequence-tosequence training of the NMT model. Even without extra monolingual corpora, our approach substantially outperforms the widely-used back-translation methods (Sennrich et al, 2016a; Edunov et al, 2018). Furthermore, we can obtain even better performance by including additional monolingual corpora.
- Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.
- Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In International Conference on Learning Representations.
- David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249.
- Olivier Chapelle, Jason Weston, Leon Bottou, and Vladimir Vapnik. 2001. Vicinal risk minimization. In Advances in neural information processing systems, pages 416–422.
- Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust neural machine translation with doubly adversarial inputs. In Association for Computational Linguistics.
- Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. Towards robust neural machine translation. In Association for Computational Linguistics.
- Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semisupervised learning for neural machine translation. In Association for Computational Linguistics.
- Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, and Preslav Nakov. 2019. One size does not fit all: Comparing nmt representations of different granularities. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On adversarial examples for character-level neural machine translation. In Proceedings of COLING.
- Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Empirical Methods in Natural Language Processing.
- Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Association for Computational Linguistics.
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In International Conference on Machine Learning.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems.
- Lu Jiang, Di Huang, and Weilong Yang. 2019. Synthetic vs real: Deep learning on controlled noise. arXiv preprint arXiv:1911.09781.
- Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning datadriven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning.
- Vladimir Karpukhin, Omer Levy, Jacob Eisenstein, and Marjan Ghazvininejad. 2019. Training on synthetic noise improves robustness to natural noise in machine translation. arXiv preprint arXiv:1902.01509.
- Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. arXiv preprint arXiv:1906.11943.
- Paul Michel, Xian Li, Graham Neubig, and Juan Pino. 2019. On evaluation of adversarial perturbations for sequence-to-sequence models. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Takeru Miyato, Andrew M Dai, and Ian Goodfellow. 2017. Adversarial training methods for semisupervised text classification. In International Conference on Learning Representations.
- Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2016. Distributional smoothing with virtual adversarial training. In International Conference on Learning Representations.
- Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a methof for automatic evaluation of machine translation. In Association for Computational Linguistics.
- Motoki Sano, Jun Suzuki, and Shun Kiyono. 2019. Effective adversarial regularization for neural machine translation. In Association for Computational Linguistics.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Improving nerual machine translation models with monolingual data. In Association for Computational Linguistics.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Neural machine translation of rare words with subword units. In Association for Computational Linguistics.
- Matthias Sperber, Jan Niehues, and Alex Waibel. 2017. Toward robust neural machine translation for noisy input sequences. In International Workshop on Spoken Language Translation.
- Christian Szegedy, Wojciech Zaremba, Sutskever Ilya, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Machine Learning.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.
- Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. In Empirical Methods in Natural Language Processing.
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
- Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
- Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Empirical Methods in Natural Language Processing.