Study on Wikipedia for translation mining for CLIR

ICMLC(2010)

引用 0|浏览19
暂无评分
摘要
The query translation of Out of Vocabulary (OOV) is one of the key factors that affect the performance of Cross-Language Information Retrieval (CLIR). Based on Wikipedia data structure and language features, the paper divides translation environment into target-existence and target-deficit environment. To overcome the difficulty of translation mining in the target-deficit environment, the frequency change information and adjacency information is used to realize the extraction of candidate units, and establish the strategy of mixed translation mining based on the frequency-distance model, surface pattern matching model and summary-score model. Search engine based OOV translation mining is taken as baseline to test the performance on TOP1 results. It is verified that the mixed translation mining method based on Wikipedia can achieve the precision rate of 0.6279, and the improvement is 6.98% better than the baseline.
更多
查看译文
关键词
computational linguistics,cross-language information retrieval,frequency adjacency information,clir,oov,language features,information retrieval,out of vocabulary,search engine,frequency-distance model,summary-score model,mixed translation mining,frequency change information,data mining,wikipedia data structure,search engines,surface pattern matching model,data structure,encyclopedias,internet,mathematical model,electronic publishing,pattern matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要