Hybrid Dimensionality Reduction Forest with Pruning for High-Dimensional Data Classification

IEEE Access(2020)

引用 10|浏览44
暂无评分
摘要
The classification of high-dimensional data is a challenge in machine learning. Traditional classifier ensemble methods improve the diversity of classifiers through either dimensionality reduction or sample selection for high-dimensional data classification. However, these methods have several limitations: 1) dimensionality reduction methods easily cause information loss, which leads to a decrease in accuracy; 2) sample selection methods are susceptible to noise and redundant features. To address the above limitations, we propose a novel hybrid dimensionality reduction forest (HDRF) to increase the diversity of an integrated system from feature space and sample space. First, a tree-based feature selection algorithm is employed to partition effective features. Then the Bagging method is applied to obtain diverse training subsets. To fully retain and mine the important information of the unselected samples, a sample-feature based transformation process (SFTP) is proposed to generate the extended features. Since PCA can effectively reduce dimension and remove noise features, it is applied to compress the unselected features and the extended features into the new features which are compact and compensatory. Further, a novel classifier ensemble pruning framework (HDRFPF) based on HDRF is designed to remove redundant and invalid classifiers. Experimental results on 23 high-dimensional data sets verify that our method outperforms mainstream classifier ensemble methods, and the better results are obtained on 19 out of 23 datasets.
更多
查看译文
关键词
Dimensionality reduction,Forestry,Feature extraction,Diversity reception,Bagging,Training,Training data,Classification,ensemble learning,feature transformation,ensemble pruning,high-dimensional data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要