Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

NATURAL LANGUAGE ENGINEERING(2023)

引用 0|浏览11
暂无评分
摘要
Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.
更多
查看译文
关键词
Corpus linguistics, Evaluation, Information extraction, Information retrieval, Multilinguality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要