Model Invertibility Regularization: Sequence Alignment With or Without Parallel Data.
HLT-NAACL(2015)
摘要
We present Model Invertibility Regularization (MIR), a method that jointly trains two directional sequence alignment models, one in each direction, and takes into account the invertibility of the alignment task. By coupling the two models through their parameters (as opposed to through their inferences, as in Liang et al.’s Alignment by Agreement (ABA), and Ganchev et al.’s Posterior Regularization (PostCAT)), our method seamlessly extends to all IBMstyle word alignment models as well as to alignment without parallel data. Our proposed algorithm is mathematically sound and inherits convergence guarantees from EM. We evaluate MIR on two tasks: (1) On word alignment, applying MIR on fertility based models we attain higher F-scores than ABA and PostCAT. (2) On Japanese-to-English backtransliteration without parallel data, applied to the decipherment model of Ravi and Knight, MIR learns sparser models that close the gap in whole-name error rate by 33% relative to a model trained on parallel data, and further, beats a previous approach by Mylonakis et al.
更多查看译文
关键词
Word error rate,Regularization (mathematics),Convergence (routing),Theoretical computer science,Coupling,Decipherment,Computer science,Sequence alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络