Unsupervised feature selection with ensemble learning
Machine Learning(2013)
摘要
In this paper, we show that the way internal estimates are used to measure variable importance in Random Forests are also applicable to feature selection in unsupervised learning. We propose a new method called Random Cluster Ensemble (RCE for short), that estimates the out-of-bag feature importance from an ensemble of partitions. Each partition is constructed using a different bootstrap sample and a random subset of the features. We provide empirical results on nineteen benchmark data sets indicating that RCE, boosted with a recursive feature elimination scheme (RFE) (Guyon and Elisseeff, Journal of Machine Learning Research, 3:1157–1182, 2003 ), can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art supervised and unsupervised algorithms, with a very limited subset of features. The method shows promise to deal with very large domains. All results, datasets and algorithms are available on line ( http://perso.univ-lyon1.fr/haytham.elghazel/RCE.zip ).
更多查看译文
关键词
Unsupervised learning,Feature selection,Ensemble methods,Random forest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络