Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

ACL, pp. 676-686, 2014.

Cited by: 31|Bibtex|Views147|Links
EI
Keywords:
statistical machine translationstructured label propagationbilingual corpushierarchical reordering modelrich languageMore(20+)
Weibo:
We presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baseline...

Abstract:

Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and mon...More

Code:

Data:

Introduction
  • Statistical approaches to machine translation (SMT) use sentence-aligned, parallel corpora to learn translation rules along with their probabilities.
  • Even in resource-rich languages, learning reliable translations of multiword phrases is a challenge, and an adequate phrasal inventory is crucial.
  • ⇤This work was done while the first author was interning at Microsoft Research for effective translation.
  • This problem is exacerbated in the many language pairs for which parallel resources are either limited or nonexistent.
  • Can the authors use monolingual data to augment the phrasal translations acquired from parallel data?
Highlights
  • Statistical approaches to machine translation (SMT) use sentence-aligned, parallel corpora to learn translation rules along with their probabilities
  • Our work introduces a new take on the problem using graphbased semi-supervised learning to acquire translation rules and probabilities by leveraging both monolingual and parallel data resources
  • We evaluated the proposed approach on both ArabicEnglish and Urdu-English under a range of scenarios (§3), varying the amount and type of monolingual corpora used, and obtained improvements between 1 and 4 BLEU points, even when using very large language models
  • In §3.2 we first analyzed the impact of utilizing phrases instead of words and structured label propagation instead of label propagation; the latter experiment underscores the importance of generated candidates
  • We examine how our approach can learn from noisy parallel data compared to the traditional Statistical approaches to machine translation system
  • We presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points
Results
  • The authors performed an extensive evaluation to examine various aspects of the approach along with overall system performance.
  • Two language pairs were used: Arabic-English and Urdu-English.
  • In §3.2 the authors first analyzed the impact of utilizing phrases instead of words and SLP instead of LP; the latter experiment underscores the importance of generated candidates.
  • The authors look at how adding morphological knowledge to the generation process can further enrich performance.
  • The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
  • The authors examine how the approach can learn from noisy parallel data compared to the traditional SMT system
Conclusion
  • The authors presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points.
  • The authors plan to estimate the graph structure through other learned, distributed representations
Summary
  • Introduction:

    Statistical approaches to machine translation (SMT) use sentence-aligned, parallel corpora to learn translation rules along with their probabilities.
  • Even in resource-rich languages, learning reliable translations of multiword phrases is a challenge, and an adequate phrasal inventory is crucial.
  • ⇤This work was done while the first author was interning at Microsoft Research for effective translation.
  • This problem is exacerbated in the many language pairs for which parallel resources are either limited or nonexistent.
  • Can the authors use monolingual data to augment the phrasal translations acquired from parallel data?
  • Results:

    The authors performed an extensive evaluation to examine various aspects of the approach along with overall system performance.
  • Two language pairs were used: Arabic-English and Urdu-English.
  • In §3.2 the authors first analyzed the impact of utilizing phrases instead of words and SLP instead of LP; the latter experiment underscores the importance of generated candidates.
  • The authors look at how adding morphological knowledge to the generation process can further enrich performance.
  • The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
  • The authors examine how the approach can learn from noisy parallel data compared to the traditional SMT system
  • Conclusion:

    The authors presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points.
  • The authors plan to estimate the graph structure through other learned, distributed representations
Tables
  • Table1: Parameters, explanation of their function, and value chosen
  • Table2: Bilingual corpus statistics for the Arabic-English and Urdu-English datasets used
  • Table3: Monolingual corpus statistics for the Arabic-English and Urdu-English evaluations. The monolingual corpora can be sub-divided into comparable, noisy parallel, and noncomparable components. En I refers to the English side of the Arabic-English corpora, and En II to the English side of the Urdu-English corpora
  • Table4: Results for the Arabic-English evaluation. The LP vs. SLP comparison highlights the importance of target side enrichment via translation candidate generation, 1-gram vs. 2-gram comparisons highlight the importance of emphasizing phrases, utilizing half the monolingual data shows sensitivity to monolingual corpus size, and adding morphological information results in additional improvement
  • Table5: Results with the large language model scenario. The gains are even better than with the smaller language model
  • Table6: Results for the Urdu-English evaluation evaluated with BLEU. All experiments were conducted with the larger language model, and generation only considered the m-best candidates from the baseline system
Download tables as Excel
Related work
  • The idea presented in this paper is similar in spirit to bilingual lexicon induction (BLI), where a seed lexicon in two different languages is expanded with the help of monolingual corpora, primarily by extracting distributional similarities from the data using word context. This line of work, initiated by Rapp (1995) and continued by others (Fung and Yee, 1998; Koehn and Knight, 2002) (inter alia) is limited from a downstream perspective, as translations for only a small number of words are induced and oftentimes for common or frequently occurring ones only. Recent improvements to BLI (Tamura et al, 2012; Irvine and Callison-Burch, 2013b) have contained a graph-based flavor by presenting label propagation-based approaches using a seed lexicon, but evaluation is once again done on top-1 or top-3 accuracy, and the focus is on unigrams.

    Razmara et al (2013) and Irvine and CallisonBurch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams, and not on enriching the entire translation model. As with previous BLI work, these approaches only take into account source-side similarity of words; only moderate gains (and in the latter work, on a subset of language pairs evaluated) are obtained. Additionally, because of our structured propagation algorithm, our approach is better at handling multiple translation candidates and does not need to restrict itself to the top translation.
Reference
  • Andrei Alexandrescu and Katrin Kirchhoff. 2009. Graph-based learning for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’09, pages 119– 127. Association for Computational Linguistics, June.
    Google ScholarLocate open access versionFindings
  • Ondrej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia. 2013. Findings of the 2013 Workshop on Statistical Machine Translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 1–44, Sofia, Bulgaria, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 17–24, New York City, USA, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Victor Chahuneau, Eva Schlinger, Noah A. Smith, and Chris Dyer. 2013. Translating into morphologically rich languages with synthetic phrases. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228, June.
    Google ScholarLocate open access versionFindings
  • Qing Dou and Kevin Knight. 2012. Large scale decipherment for out-of-domain machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 266–275. Association for Computational Linguistics, July.
    Google ScholarLocate open access versionFindings
  • Pascale Fung and Lo Yuen Yee. 1998. An ir approach for translating new words from nonparallel, comparable texts. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1, ACL ’98, pages 414– 420, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michel Galley and Christopher D. Manning. 200A simple and effective hierarchical phrase reordering model. EMNLP ’08, pages 848–856, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarFindings
  • Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL08: HLT, pages 771–779, Columbus, Ohio, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ann Irvine and Chris Callison-Burch. 2013a. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 262–270, Sofia, Bulgaria, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ann Irvine and Chris Callison-Burch. 2013b. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 518–523, Atlanta, Georgia, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alexandre Klementiev, Ann Irvine, Chris CallisonBurch, and David Yarowsky. 20Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 130–140, Avignon, France, April. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In In Proceedings of ACL Workshop on Unsupervised Lexical Acquisition, pages 9–16.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03, pages 48–54, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shujie Liu, Chi-Ho Li, Mu Li, and Ming Zhou. 2012. Learning translation consensus with structured label propagation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pages 302–310, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved statistical machine translation using monolingually-derived paraphrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP ’09, pages 381–390, Singapore, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • David McClosky, Eugene Charniak, and Mark Johnson. 2006. Effective self-training for parsing. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 152–159, New York City, USA, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 160– 167, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. pages 311–318.
    Google ScholarFindings
  • Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, ACL ’95.
    Google ScholarLocate open access versionFindings
  • Sujith Ravi and Kevin Knight. 2011. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 12– 21, Portland, Oregon, USA, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Majid Razmara, Maryam Siahbani, Gholamreza Haffari, and Anoop Sarkar. 2013. Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation. In Proceedings of the 51st of the Association for Computational Linguistics, ACL-51, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew Snover, Bonnie Dorr, and Richard Schwartz. 2008. Language and translation model adaptation using comparable corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 857–866, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Akihiro Tamura, Taro Watanabe, and Eiichiro Sumita. 2012. Bilingual lexicon extraction from comparable corpora using label propagation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pages 24–36.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Hisami Suzuki, and Achim Ruopp. 2008. Applying morphology generation models to machine translation. In Proceedings of ACL-08: HLT, pages 514–522, Columbus, Ohio, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiajun Zhang and Chengqing Zong. 2013. Learning a phrase-based translation model from monolingual data with application to domain adaptation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1425–1434, Sofia, Bulgaria, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the Twentieth International Conference on Machine Learning, ICML ’03, pages 912–919.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments