Advancing the Terminological Classification of Semi-structured Documents

IEEE International Conference on Tools with Artificial Intelligence(2015)

引用 1|浏览17
暂无评分
摘要
Usually, documents are given in textual form, accompanied by a set of terminological classifications (metadata), based on vocabularies of domain ontologies. This paper presents a novel method for advancing the above classification, by extracting more properties of the analyzed documents. We first extract additional roles from the textual part and together with roles extracted from the ontology statements, we construct an extended document vector representation. We then introduce a pruning algorithm that, for a given document collection, merges concepts of the ontology to produce classes with a sufficient number of corresponding instances. We then classify the documents to ontology classes using the Stanford linear Classifier. Finally, we propose an algorithm that assigns additional concept labels to documents, using the output of the classifier. Our system is evaluated in a set of real data and ontological descriptions and its performance is measured in terms of various accuracy and specificity measures indicates that the proposed approach for documents classification produces correct labels for the majority of items.
更多
查看译文
关键词
Semantic Web, Ontologies, Classification, Concepts, Annotation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要