An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language

APPLIED ARTIFICIAL INTELLIGENCE(2019)

引用 3|浏览55
暂无评分
摘要
Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要