AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We described the techniques used in the latest Microsoft machine translation system to reach a new state-of-the-art

Achieving Human Parity on Automatic Chinese to English News Translation.

arXiv: Computation and Language, (2018)

Cited: 461|Views320
EI
Full Text
Bibtex
Weibo

Abstract

Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first addres...More

Code:

Data:

Introduction
  • Recent years have seen human performance levels reached or surpassed in tasks ranging from games such as Go [33] to classification of images in ImageNet [21] to conversational speech recognition on the Switchboard task [50].

    In the area of machine translation, the authors have seen dramatic improvements in quality with the advent of attentional encoder-decoder neural networks [35, 3, 39].
  • This paper summarizes how the authors achieved human parity in translating text in the news domain, from Chinese to English.
  • Translation of news text has been an area of active interest in the Machine Translation community for over a decade, due to the practical and commercial importance of this domain, the availability of abundant parallel data on the web and a long history of government-funded projects and evaluation campaigns, such as NIST-OpenMT1 and GALE2.
  • The annual evaluation campaign of the WMT (Conference on Machine Translation) [6], has focused on news translation for more than a decade
Highlights
  • Recent years have seen human performance levels reached or surpassed in tasks ranging from games such as Go [33] to classification of images in ImageNet [21] to conversational speech recognition on the Switchboard task [50].

    In the area of machine translation, we have seen dramatic improvements in quality with the advent of attentional encoder-decoder neural networks [35, 3, 39]
  • It is worth noting that the experiments reported in Table 1 were constrained data experiments limited to WMT17 official data only
  • Based on these results we claim that we have achieved human parity according to Definition 2, as our research systems are indistinguishable from human translations
  • We use an updated version of Appraise [14], the same tool which is used in the human evaluation campaign for the Conference on Machine Translation (WMT).11
  • We described the techniques used in the latest Microsoft machine translation system to reach a new state-of-the-art
  • Our evaluation found that our system has reached parity with professional human translations on the WMT17 Conference on Machine Translation 2017 Chinese to English news task, and exceeds the quality of crowd-sourced references
Methods
  • The authors first introduce the data and experimental setup used in the experiments, and evaluate each of the systems introduced in Section 3, both independently and after system combination and re-ranking.

    4.1 Data and Experimental Setup

    The authors use all of the available parallel data for the WMT17 Chinese-English translation task.
  • The authors are left with 18M bilingual sentence pairs.
  • The authors use the Chinese and English language models trained on the 18M sentences of bilingual data to filter the monolingual sentences from “News Crawl: articles from 2016” and “Common Crawl” provided by WMT17 using CED [28].
  • The authors retain about 7M English and Chinese monolingual sentences.
  • The monolingual data will be deployed in both dual learning and back-translation setups through the experiments
Results
  • The Transformer model [40] is adopted as the baseline. Unless otherwise mentioned, all translation experiments use the following hyper-parameter settings based on Tensor2Tensor Transformer-big settings v1.3.08.
  • The evaluation results of the Dual Learning and Deliberation Network systems on WMT 2017 ChineseEnglish test set are listed in the second section of Table 1.
  • The evaluation results of the agreement regularization and the unified joint training are listed in the third section of Table 1.
  • Table 4 presents the results from the large scale human evaluation campaign
  • Based on these results the authors claim that the authors have achieved human parity according to Definition 2, as the research systems are indistinguishable from human translations.
  • The authors collected at least n ≥ 610 assessments per system
Conclusion
  • Discussion and Future

    Work

    In this paper, the authors described the techniques used in the latest Microsoft machine translation system to reach a new state-of-the-art.
  • The authors exploited the dual nature of the translation problem to better utilize parallel data as well as monolingual data in a more principled way.
  • The authors utilized joint training of source-to-target, and target-to-source systems to further improve on the duality of the translation task.
  • The authors addressed the exposure bias problem in two ways: by two-pass decoding using Deliberation networks, as well as by agreement regularization and joint training of left-to-right, right-to-left systems.
  • The authors found significant gains from combining multiple heterogeneous systems
Tables
  • Table1: Automatic (BLEU) evaluation results on the WMT 2017 Chinese-English test set
  • Table2: Evaluation Data selection results on the WMT 2017 Chinese-English test set
  • Table3: System combination results on the WMT 2017 Chinese-English test set
  • Table4: Human Evaluation Results for at least n ≥ 1, 827 assessments per system show that our research systems Combo-4, Combo-5, and Combo-6 achieve human parity according to definition 2 as they are not distinguishable from Reference-HT, which is a human translation. All our research systems significantly outperform Reference-PE, which is based on human postediting of machine translation output, and the original Reference-WMT, which is again a human translation. # denotes the ranking cluster, Ave % the averaged raw score r ∈ [0, 100], and Ave z the standardized z score. n ≥ x denotes that we collected at least x assessments per system for the respective evaluation campaign. This is referred to as Meta-1 in Table 5g
  • Table5: Complete results for our three iterations over Subset-1 (5a, 5b, 5c) and our evaluation campaigns for Subset2 (5d), Subset-3 (5e), and Subset-4 (5f). We also show results for combined data for Meta-1 (5g) combining annotations from all iterations over Subset-1. # denotes the ranking cluster, Ave % the averaged raw score r ∈ [0, 100], and Ave z the standardized z score. n ≥ x denotes that we collected at least x assessments per system for the respective evaluation campaign. All campaigns involved a = 15 annotators. Systems in higher clusters significantly outperform all systems in lower clusters according to Wilcoxon rank sum test at p-level p ≤ 0.05, following WMT17. Systems in the same cluster are ordered by z score but considered tied w.r.t. quality
  • Table6: BLEU scores against single or multiple references. WMT is Reference-WMT, PE is Reference-PE, HT is Reference-HT. Scoring based on sacreBLEU v1.2.3, with signature BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.3 for refs=1. Signature changes to numrefs.2 and numrefs.3 for refs=2 and refs=3, respectively. Note how different scores for Reference-WMT and Reference-PE are compared to Reference-HT and how these compare to our findings reported in Table 5. This emphasizes the need for human evaluation
  • Table7: Error distribution, as fraction of sentences that contain specific error categories
Download tables as Excel
Funding
  • Describes Microsoft’s machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English
  • Finds that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations
  • Finds that it significantly exceeds the quality of crowd-sourced non-professional translations
  • Achieved human parity in translating text in the news domain, from Chinese to English
  • Finds that the quality of reference translations, long assumed to be "gold" annotations by professional translators, are sometimes of remarkably poor quality
Reference
  • Artetxe, M., Labaka, G., Agirre, E., and Cho, K. Unsupervised neural machine translation. In International Conference on Learning Representations (2018).
    Google ScholarLocate open access versionFindings
  • Axelrod, A., He, X., and Gao, J. Domain adaptation via pseudo in-domain data selection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Stroudsburg, PA, USA, 2011), EMNLP ’11, Association for Computational Linguistics, pp. 355–362.
    Google ScholarLocate open access versionFindings
  • Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
    Findings
  • Belinkov, Y., and Bisk, Y. Synthetic and natural noise both break neural machine translation. CoRR abs/1711.02173 (2017).
    Findings
  • Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS (2015), pp. 1171–1179.
    Google ScholarLocate open access versionFindings
  • Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., and Turchi, M. Findings of the 2017 conference on machine translation (wmt17). In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers (Copenhagen, Denmark, September 2017), Association for Computational Linguistics, pp. 169–214.
    Google ScholarLocate open access versionFindings
  • Cettolo, M., Federico, M., Bentivogli, L., Niehues, J., Stüker, S., Sudoh, K., Yoshino, K., and Federmann, C. Overview of the iwslt 2017 evaluation campaign. In Proceedings of the 14th International Workshop on Spoken Language Translation (IWSLT) (Tokyo, Japan, December 2017), IWSLT, pp. 2–12.
    Google ScholarLocate open access versionFindings
  • Cherry, C., and Foster, G. Batch tuning strategies for statistical machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Stroudsburg, PA, USA, 2012), NAACL HLT ’12, Association for Computational Linguistics, pp. 427–436.
    Google ScholarLocate open access versionFindings
  • Clark, J. H., Dyer, C., Lavie, A., and Smith, N. A. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2 (2011), Association for Computational Linguistics, pp. 176–181.
    Google ScholarLocate open access versionFindings
  • Denkowski, M., and Lavie, A. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation (2011).
    Google ScholarLocate open access versionFindings
  • Devlin, J. Sharp models on dull hardware: Fast and accurate neural machine translation decoding on the cpu. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (Copenhagen, Denmark, September 2017), Association for Computational Linguistics, pp. 2810–2815.
    Google ScholarLocate open access versionFindings
  • Dreyer, M., and Marcu, D. HyTER: Meaning-Equivalent Semantics for Translation Evaluation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2012), Association for Computational Linguistics, pp. 162–171.
    Google ScholarLocate open access versionFindings
  • Edgington, E. S. Validity of Randomization Tests for One-subject Experiments. Journal of Educational Statistics 5, 3 (1980), 235–251.
    Google ScholarLocate open access versionFindings
  • Federmann, C. Appraise: An open-source toolkit for manual evaluation of machine translation output. The Prague Bulletin of Mathematical Linguistics 98 (September 2012), 25–35.
    Google ScholarLocate open access versionFindings
  • Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., and Zhu, K. Q. Improving attention modeling with implicit distortion and fertility for machine translation. In COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan (2016), pp. 3082–3092.
    Google ScholarLocate open access versionFindings
  • Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017).
    Findings
  • Graham, Y., Baldwin, T., Moffat, A., and Zobel, J. Can machine translation systems be evaluated by the crowd alone? Natural Language Engineering, 23(1), 3-30. doi:10.1017/S1351324915000339 (2016).
    Locate open access versionFindings
  • Gu, J., Hassan, H., Devlin, J., and Li, V. Universal neural machine translation for extremely low resource languages.
    Google ScholarFindings
  • Hassan, H., Elaraby, M., and Tawfik, A. Y. Synthetic data for neural machine translation of spoken-dialects.
    Google ScholarFindings
  • He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., and Ma, W.-Y. Dual learning for machine translation. In Advances in Neural Information Processing Systems (2016), pp. 820– 828.
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).
    Findings
  • Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. arXiv preprint arXiv:1611.04558 (2016).
    Findings
  • Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014).
    Findings
  • Lample, G., Conneau, A., Denoyer, L., and Ranzato, M. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations (2018).
    Google ScholarLocate open access versionFindings
  • Lin, J., Xia, Y., Qin, T., Chen, Z., and Liu, T.-Y. Conditional image-to-image translation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2018).
    Google ScholarLocate open access versionFindings
  • Luo, P., Wang, G., Lin, L., and Wang, X. Deep dual learning for semantic image segmentation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017 (2017), pp. 2737–2745.
    Google ScholarLocate open access versionFindings
  • Mann, H. B., and Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.
    Google ScholarFindings
  • Moore, R. C., and Lewis, W. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers (Stroudsburg, PA, USA, 2010), ACLShort ’10, Association for Computational Linguistics, pp. 220–224.
    Google ScholarLocate open access versionFindings
  • Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (2002), Association for Computational Linguistics, pp. 311–318.
    Google ScholarLocate open access versionFindings
  • Sennrich, R., Haddow, B., and Birch, A. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015).
    Findings
  • Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015).
    Findings
  • Shen, W., and Liu, R. Learning residual images for face attribute manipulation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE, pp. 1225–1233.
    Google ScholarLocate open access versionFindings
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., and et al. Mastering the game of go with deep neural networks and tree search. Nature, vol. 529, pp. 484489 (2016).
    Google ScholarLocate open access versionFindings
  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas (2006), vol. 200.
    Google ScholarLocate open access versionFindings
  • Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. NIPS (2014).
    Google ScholarLocate open access versionFindings
  • Tang, D., Duan, N., Qin, T., and Zhou, M. Question answering and question generation as dual tasks. arXiv preprint arXiv:1706.02027 (2017).
    Findings
  • Tu, Z., Lu, Z., Liu, Y., Liu, X., and Li, H. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811 (2016).
    Findings
  • van der Wees, M., Bisazza, A., and Monz, C. Dynamic data selection for neural machine translation. CoRR abs/1708.00712 (2017).
    Findings
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
    Findings
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In NIPS (2017).
    Google ScholarLocate open access versionFindings
  • Vilar, D., Xu, J., D’Haro, L. F., and Ney, H. Error analysis of statistical machine translation output. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 22-28, 2006. (2006), pp. 697–702.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Cheng, S., Jiang, L., Yang, J., Chen, W., Li, M., Shi, L., Wang, Y., and Yang, H. Sogou neural machine translation systems for WMT17. In Proceedings of the Second Conference on Machine Translation, WMT 2017, Copenhagen, Denmark, September 7-8, 2017 (2017), pp. 410–415.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Li, X., Cheng, S., Jiang, L., Yang, J., Chen, W., Shi, L., Wang, Y., and Yang, H. Sogou neural machine translation systems for wmt17. In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers (Copenhagen, Denmark, September 2017), Association for Computational Linguistics, pp. 410–415.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., Liu, G., and Liu, T. Dual transfer learning for neural machine translation with marginal distribution regularization. In AAAI (2018).
    Google ScholarLocate open access versionFindings
  • Wilcoxon, F. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80–83.
    Google ScholarLocate open access versionFindings
  • Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv e-prints (Sept. 2016).
    Google ScholarLocate open access versionFindings
  • Xia, Y., Bian, J., Qin, T., Yu, N., and Liu, T.-Y. Dual inference for machine learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 (2017), pp. 3112–3118.
    Google ScholarLocate open access versionFindings
  • Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., and Liu, T. Dual supervised learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (2017), pp. 3789–3798.
    Google ScholarLocate open access versionFindings
  • Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., and Liu, T.-Y. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems (2017), pp. 1782–1792.
    Google ScholarLocate open access versionFindings
  • Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M. L., Stolcke, A., Yu, D., and Zweig, G. Toward human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2410–2423.
    Google ScholarLocate open access versionFindings
  • Yi, Z., Zhang, H., Tan, P., and Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. ICCV (2017).
    Google ScholarLocate open access versionFindings
  • Zhang, Z., Liu, S., Li, M., Zhou, M., and Chen, E. Joint training for neural machine translation models with monolingual data, 2018.
    Google ScholarFindings
  • Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).
    Findings
  • Zoph, B., Yuret, D., May, J., and Knight, K. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016).
    Findings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn