Combining Dual Word Embeddings with Open Directory Project Based Text Classification

2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)(2018)

引用 1|浏览19
暂无评分
摘要
Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9% and 37% in terms of F1-score and precision at k, over the state-of-the-art techniques.
更多
查看译文
关键词
Text Classification,Word embeddings,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要