Consistency-Aware Search for Word Alignment

Conference on Empirical Methods in Natural Language Processing, 2015.

Cited by: 2|Bibtex|Views27|Links
EI
Keywords:
alignment modeltranslation modelmachine translationtranslation qualitylanguage sentenceMore(10+)
Weibo:
We have presented a general framework for optimizing word alignment with respect to machine translation

Abstract:

As conventional word alignment search algorithms usually ignore the consistency constraint in translation rule extraction, improving alignment accuracy does not necessarily increase translation quality. We propose to use coverage, which reflects how well extracted phrases can recover the training data, to enable word alignment to model co...More

Code:

Data:

0
Introduction
  • Word alignment, which aims to identify the correspondence between words in two languages, plays an important role in statistical machine translation (Brown et al, 1993).
  • Definition 1 Given a source-language sentence f = f1J = f1 .
  • Definition 2 Given a training example f , e, a , a bilingual phrase B is a pair of source and target phrases: B = such that 1 ≤ j1 ≤ j2 ≤ J ∧ 1 ≤ i1 ≤ i2 ≤ I.
  • Definition 3 A bilingual phrase B = is said to be tight if and only if all boundary words are aligned.
Highlights
  • Word alignment, which aims to identify the correspondence between words in two languages, plays an important role in statistical machine translation (Brown et al, 1993)
  • Wordaligned bilingual corpora serve as a fundamental resource for translation rule extraction, for phrase-based models (Koehn et al, 2003; Och and Ney, 2004), and for syntax-based models (Chiang, 2005; Galley et al, 2006)
  • Separating word alignment from translation rule extraction suffers from a major problem: maximizing the accuracy of word alignment does not necessarily lead to the improvement of translation quality
  • We have presented a general framework for optimizing word alignment with respect to machine translation
  • Experiments show the our approach is effective in both alignment and translation tasks across various alignment models, translation models, and language pairs
Methods
  • 5.1 Setup

    5.1.1 Languages and Datasets

    The authors evaluated the approach in terms of alignment and translation quality on five language pairs: Chinese-English (ZH-EN), Czech-English (CSEN), German-English (DE-EN), Spanish-English (ES-EN), and French-English (FR-EN).
  • The authors used the SRILM toolkit (Stolcke, 2002) to train a 4gram language model on the Xinhua portion of the English GIGAWORD corpus, which contains 398.6M words.
  • 3 For translation evaluation, the authors used the NIST 2006 dataset as the development set and the NIST 2002, 2003, 2004, 2005 and 2008 datasets as the test sets.
  • The English language model trained on the Xinhua portion of the English GIGAWORD corpus was used for translation from European languages to English.
  • The authors used the “news-test2012” dataset that contains 3,003 sentences as the development set and the “news-test2013” dataset that contains 3,000 sentences as the test set
Results
  • All differences are statistically significant at p < 0.01 level. All the differences are statistically significant at p < 0.01 level.
  • The authors find that the approach outperforms the baseline statistically significantly at p < 0.01 for four language pairs and p < 0.05 for one language pair
Conclusion
  • The authors have presented a general framework for optimizing word alignment with respect to machine translation.
  • The authors develop a consistency-aware search algorithm that calculates coverage on the fly during search efficiently.
  • The authors plan to apply the approach to syntax-based models (Galley et al, 2006; Liu et al, 2006; Shen et al, 2008) and include the constituency constraint in the optimization objective.
  • It is interesting to develop consistency-aware training algorithms for word alignment
Summary
  • Introduction:

    Word alignment, which aims to identify the correspondence between words in two languages, plays an important role in statistical machine translation (Brown et al, 1993).
  • Definition 1 Given a source-language sentence f = f1J = f1 .
  • Definition 2 Given a training example f , e, a , a bilingual phrase B is a pair of source and target phrases: B = such that 1 ≤ j1 ≤ j2 ≤ J ∧ 1 ≤ i1 ≤ i2 ≤ I.
  • Definition 3 A bilingual phrase B = is said to be tight if and only if all boundary words are aligned.
  • Methods:

    5.1 Setup

    5.1.1 Languages and Datasets

    The authors evaluated the approach in terms of alignment and translation quality on five language pairs: Chinese-English (ZH-EN), Czech-English (CSEN), German-English (DE-EN), Spanish-English (ES-EN), and French-English (FR-EN).
  • The authors used the SRILM toolkit (Stolcke, 2002) to train a 4gram language model on the Xinhua portion of the English GIGAWORD corpus, which contains 398.6M words.
  • 3 For translation evaluation, the authors used the NIST 2006 dataset as the development set and the NIST 2002, 2003, 2004, 2005 and 2008 datasets as the test sets.
  • The English language model trained on the Xinhua portion of the English GIGAWORD corpus was used for translation from European languages to English.
  • The authors used the “news-test2012” dataset that contains 3,003 sentences as the development set and the “news-test2013” dataset that contains 3,000 sentences as the test set
  • Results:

    All differences are statistically significant at p < 0.01 level. All the differences are statistically significant at p < 0.01 level.
  • The authors find that the approach outperforms the baseline statistically significantly at p < 0.01 for four language pairs and p < 0.05 for one language pair
  • Conclusion:

    The authors have presented a general framework for optimizing word alignment with respect to machine translation.
  • The authors develop a consistency-aware search algorithm that calculates coverage on the fly during search efficiently.
  • The authors plan to apply the approach to syntax-based models (Galley et al, 2006; Liu et al, 2006; Shen et al, 2008) and include the constituency constraint in the optimization objective.
  • It is interesting to develop consistency-aware training algorithms for word alignment
Tables
  • Table1: Comparison of different settings of coverage on the Chinese-English dataset using Moses. “h” denotes “hard”, “s” denotes “soft”, “l” denotes “loose”, and “t” denotes “tight”. The BLEU scores were calculated on the development set. For quick validation, we used a small fraction of the training data to train the phrase-based model
  • Table2: Comparison of different alignment methods on the Chinese-English dataset. “GDF” denotes the grow-diag-final heuristic. “phrase count” denotes optimizing with respect to maximizing the number of extracted tight phrases. We used Moses to extract loose phrases from word-aligned training data for all methods. “# bp” denotes the number of extracted bilingual phrases, “# sp” denotes the number of source phrases, “# tp” denotes the number of target phrases, “# sw” denotes the source vocabulary size, “# tw” denotes the target vocabulary size. We report BLEU scores on the NIST 2005 test set
  • Table3: Translation evaluation on different alignment models. We apply our approach to both generative and discriminative alignment models. “generative” denotes applying the grow-diag-final heuristic to the alignments produced by IBM Model 4 in two directions. “discriminative” denotes the log-linear alignment model (<a class="ref-link" id="cLiu_et+al_2010_a" href="#rLiu_et+al_2010_a">Liu et al, 2010</a>). Adding coverage leads to significant improvements. We use “**” to denote that the difference is statistically significant at p < 0.01 level
  • Table4: Translation evaluation on different translation models. For translation, We used both phrasebased and hierarchical phrase-based models. For alignment, we used the generative model. “generative” denotes applying the grow-diag-final heuristic to the alignments produced by IBM Model 4 in two directions. Adding coverage leads to significant improvements. We use “**” to denote that the difference is statistically significant at p < 0.01 level
  • Table5: Translation evaluation on five language pairs. “generative” denotes applying the grow-diag-final heuristic to the alignments produced by IBM Model 4 in two directions. We use “*” and“**” to denote that the difference is statistically significant at p < 0.05 and p < 0.01, respectively. Note that ZH-EN uses four references and other language pairs only use single references
Download tables as Excel
Related work
  • Our work is inspired by three lines of research: (1) reachability in discriminative training of translation models, (2) structural constraints for alignment, and (3) learning with constraints.

    6.1 Reachability in Discriminative Training of Translation Models

    Discriminative training algorithms for statistical machine translation often need reachable training examples to find full derivations for updating model parameters (Liang et al, 2006a; Yu et al, 2013). Yu et al (2013) report that only 32.1% sentences in the Chinese-English training data that contain 12.7% words are fully reachable translation phrase hierarchical alignment generative +coverage generative +coverage

    29.60 30.63** 30.43 31.60**

    31.84 32.89** 33.36 34.67**

    31.68 32.77** 32.58 34.14**

    31.80 32.96** 32.72 34.24**

    30.40 31.33** 31.57 32.73**

    24.53 25.25** 24.21 24.89**

    alignment ZH-EN CS-EN DE-EN ES-EN FR-EN generative 30.40 19.89 21.13 26.39 26.22 +coverage 31.33** 20.04* 21.63** 26.79** 26.76**

    due to noisy alignments and distortion limit. They find that most reachable sentences are short and generally literal.
Funding
  • Yang Liu and Maosong Sun are supported by the 863 Program (2015AA011808) and the National Natural Science Foundation of China (No 61331013 and No 61432013)
  • Huanbo Luan is supported by the National Natural Science Foundation of China (No 61303075)
  • This research is also supported by the Singapore National Research Foundation under its International Research Centre@Singapore Funding Initiative and administered by the IDM Programme
Reference
  • Necip Fazil Ayan and Bonnie J. Dorr. 2006. Going beyond aer: An extensive analysis of word alignments and their impact on mt. In Proceedings of COLINGACL 2006, pages 9–16, Sydney, Australia, July.
    Google ScholarLocate open access versionFindings
  • Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
    Google ScholarLocate open access versionFindings
  • Chris Callison-Burch, David Talbot, and Miles Osborne. 2004. Statistical machine translation with word- and sentence aligned parallel corpora. In Proceedings of ACL 2004.
    Google ScholarLocate open access versionFindings
  • Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2007. Guiding semi-supervision with constraint-driven learning. In Proceedings of ACL 2007.
    Google ScholarLocate open access versionFindings
  • David Chiang. 200A hierarchical phrasebased model for statistical machine translation. In Proceedings of ACL 2005, pages 263–270, Ann Arbor, Michigan, June.
    Google ScholarLocate open access versionFindings
  • David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201– 228.
    Google ScholarLocate open access versionFindings
  • Trevor Cohn and Phil Blunsom. 2009. A bayesian model of syntax-directed tree to string grammar induction. In Proceedings of EMNLP 2009.
    Google ScholarLocate open access versionFindings
  • John DeNero and Dan Klein. 200The complexity of phrase alignment problems. In Proceedings of ACL 2008.
    Google ScholarLocate open access versionFindings
  • John DeNero and Klein Klein. 2010. Discriminative modeling of extraction sets for machine translation. In Proceedings of ACL 2010.
    Google ScholarLocate open access versionFindings
  • Yonggang Deng and Bowen Zhou. 2009. Optimizing word alignment combination for phrase table training. In Proceedings of ACL 2009.
    Google ScholarLocate open access versionFindings
  • Alexander Fraser and Daniel Marcu. 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics, Squibs and Discussions, 33(3):293–303.
    Google ScholarLocate open access versionFindings
  • Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of COLING-ACL 2006, pages 961–968, Sydney, Australia, July.
    Google ScholarLocate open access versionFindings
  • Kuzmann Ganchev, Joao Graca, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research.
    Google ScholarLocate open access versionFindings
  • Cyril Goutte, Kenji Yamada, and Eric Gaussier. 2004. Aligning words using matrix factorisation. In Proceedings of ACL 2004.
    Google ScholarLocate open access versionFindings
  • Aria Haghighi, John Blitzer, John DeNero, and Dan Klein. 2009. Better word alignments with supervised itg models. In Proceedings of ACL 2009.
    Google ScholarLocate open access versionFindings
  • Abraham Ittycheriah and Salim Roukos. 2005. A maximum entropy word aligner for arabic-english machine translation. In Proceedings of EMNLP 2005.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn and Hieu Hoang. 2007. Factored translation models. In Proceedings of EMNLPCoNLL 2007, pages 868–876, Prague, Czech Republic, June.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, pages 127–133, Edmonton, Canada, May.
    Google ScholarLocate open access versionFindings
  • Percy Liang, Alexandre Bouchard-Cote, Dan Klein, and Ben Taskar. 2006a. An end-to-end discriminative approach to machine translation. In Proceedings of ACL 2006.
    Google ScholarLocate open access versionFindings
  • Percy Liang, Ben Taskar, and Dan Klein. 2006b. Alignment by agreement. In Proceedings of HLTNAACL 2006, pages 104–111, New York City, USA, June.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of AAAI 2015.
    Google ScholarLocate open access versionFindings
  • Yang Liu, Qun Liu, and Shouxun Lin. 2006. Treeto-string alignment template for statistical machine translation. In Proceedings of COLING/ACL 2006.
    Google ScholarLocate open access versionFindings
  • Yang Liu, Qun Liu, and Shouxun Lin. 2010. Discriminaitve word alignment by linear modeling. Computational Linguistics, 36(3):303–339.
    Google ScholarLocate open access versionFindings
  • Daniel Marcu and William Wong. 2002. A phrasebased, joint probability model for statistical machine translation. In Proceedings of EMNLP 2002.
    Google ScholarLocate open access versionFindings
  • Franz J. Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51.
    Google ScholarLocate open access versionFindings
  • Franz J. Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417–449.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a methof for automatic evaluation of machine translation. In Proceedings of ACL 2002.
    Google ScholarLocate open access versionFindings
  • Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL 2008.
    Google ScholarLocate open access versionFindings
  • Andreas Stolcke. 2002. Srilm - an extensible language modeling toolkit. In Proceedings of ICSLP 2002.
    Google ScholarLocate open access versionFindings
  • Wei Wang, Jonathan May, Kevin Knight, and Daniel Marcu. 2010. Re-structuring, re-labeling, and re-aligning for syntax-based machine translation. Computational Linguistics.
    Google ScholarFindings
  • Dekai Wu. 1997. Stochastic inversion transaction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23:377–404.
    Google ScholarLocate open access versionFindings
  • Heng Yu, Liang Huang, Haitao Mi, and Kai Zhao. 2013. Max-violation perceptron and forced decoding for scalable mt training. In Proceedings of EMNLP 2013.
    Google ScholarLocate open access versionFindings
  • Hao Zhang and Danieal Gildea. 2005. Stochastic lexicalized inversion transduction grammars for alignment. In Proceedings of ACL 2005.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments