Automatic acquisition of bilingual rules for extraction of bilingual word pairs from parallel corpora
DeepLA '05: Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition(2005)
摘要
In this paper, we propose a new learning method to solve the sparse data problem in automatic extraction of bilingual word pairs from parallel corpora with various languages. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any bilingual resource (e.g., a bilingual dictionary, machine translation systems) beforehand. We call this method Inductive Chain Learning (ICL). The ICL can limit the search scope for the decision of equivalents. Using ICL, the recall in three systems based on similarity measures improved respectively 8.0, 6.1 and 6.0 percentage points. In addition, the recall value of GIZA++ improved 6.6 percentage points using ICL.
更多查看译文
关键词
parallel corpus,percentage point,sparse data problem,bilingual dictionary,bilingual resource,bilingual word pair,method Inductive Chain Learning,new learning method,recall value,automatic extraction,bilingual rule,automatic acquisition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要