Particle Swarm Optimization Based Two-Stage Feature Selection In Text Mining

2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC)(2018)

引用 31|浏览20
暂无评分
摘要
Text mining is an important and popular data mining topic, where a fundamental objective is to enable users to extract informative data from text-based assets and perform related operations on the text, like retrieval, classification, and summarization. For text classification, one of the most important steps is feature selection, because not all the features in the text dataset are useful for classification. Irrelevant and redundant features should be removed to increase the accuracy and decrease the complexity and running time, but it is often an expensive process, and most existing methods using a simple filter to remove features, which might potentially loose some useful ones because of feature interactions. Furthermore, there is little research using particle swarm optimization ( PSO) algorithms to select informative features for text classification. This paper presents an approach using a novel two-stage method for text feature selection, where with the features selected by four different filter ranking methods at the first stage, more irrelevant features are removed by PSO to compose the final feature subset. The proposed algorithm is compared with four traditional feature selection methods on the commonly used Reuter-21578 dataset. The experimental results show that the proposed two-stage method can substantially reduce the dimensionality of the feature space and improve the classification accuracy.
更多
查看译文
关键词
Feature selection, text mining, particle swarm optimization, two-stage method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要