Core-Concept-seeded based LDA towards Ontology Learning

semanticscholar(2020)

引用 0|浏览3
暂无评分
摘要
Ontologies are powerful semantic models applied for various purposes such as improving system interoperability, information retrieval, question answering, etc. However, modeling domain ontology remains a tough task for humans especially when the concepts and properties are large or evolving, and also when the modeling is performed on a large-scale text data. Machine learning provides a valuable help by automatizing ontology learning from texts. In peculiar for concept formation task, clustering techniques are able to deal with a huge number of terms to extract concepts, i.e. clusters of semantically similar terms. However, current works have the issues of learning relevant clusters for specific domain or making relevant labels for clusters. To solve these issues, we propose both to use core concepts from a domain ontology as prior knowledge, and to adapt term clustering with seed knowledge based LDA models in order to take into account these core concepts, by first assigning each LDA topic to a seeding core concept, then guiding LDA to put terms linked to the same core concept in the same topic. We evaluate our proposal on two textual corpora and compare it to 4 other clustering based approaches (two unsupervised methods and two semi-supervised methods). The results show that our approach beats them significantly on clean corpus, and noisy corpus without serious imbalance on core concept classes (considering number of terms). For noisy corpus with prominent imbalance problem, our SMBM-SW is a good alternative.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要