Measuring Chinese-English Cross-Lingual Word Similarity With Hownet And Parallel Corpus
CICLing'11: Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II(2011)
摘要
Cross-lingual word similarity (CLWS) is a basic component in cross-lingual information access systems. Designing a CLWS measure faces three challenges: (i) Cross-lingual knowledge base is rare; (ii) Cross-lingual corpora are limited; and (iii) No benchmark cross-lingual dataset is available for CLWS evaluation. This paper presents some Chinese-English CLWS measures that adopt HowNet as cross-lingual knowledge base and sentence-level parallel corpus as development data. In order to evaluate these measures, a Chinese-English cross-lingual benchmark dataset is compiled based on the Miller-Charles' dataset. Two conclusions are drawn from the experimental results. Firstly, HowNet is a promising knowledge base for the CLWS measure. Secondly. parallel corpus is promising to fine-tune the word similarity measures usine cross-lingual co-occurrence statistics.
更多查看译文
关键词
Cross-lingual word similarity,cross-lingual information access,HowNet,parallel corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络