Automatic extraction of bilingual word pairs from parallel corpora with various languages using learning for adjacent information.

Systems and Computers in Japan(2006)

引用 0|浏览15
暂无评分
摘要
This paper presents a learning method using adjacent information as the method to extract bilingual word pairs efficiently from parallel corpora with various languages for which language resources are insufficient. In our method, information about correspondence between source language words and target language words is acquired automatically using the word strings that adjoin bilingual word pairs. That acquired information is used to solve the ambiguity problem of correspondence between source language words and target language words in various bilingual sentence pairs. First, the system using our method automatically acquires templates as information that indicates correspondence between source language words and target language words. The templates are based on word strings that adjoin the bilingual word pairs. Moreover, the system using our method efficiently extracts bilingual word pairs from bilingual sentence pairs using the acquired templates. Evaluation experiments showed that the system using our method extracted bilingual word pairs from parallel corpora with five kinds of languages. Results show that the total extraction rate was 60.1%. The total extraction rate was better by 8.0 percentage points compared to that obtained using a system based only on the Dice coefficient without our method. Those results confirm the effectiveness of our method. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(13): 40–53, 2006; Published online in Wiley InterScience (). DOI 10.1002/scj.20534
更多
查看译文
关键词
adjacent information,bilingual word pairs,learning,similarity measure,various languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要