A new two-stage hybrid feature selection algorithm and its application in Chinese medicine

International Journal of Machine Learning and Cybernetics(2021)

引用 4|浏览2
暂无评分
摘要
High-dimensional small sample data are prone to the curse of dimensionality and overfitting and contain many irrelevant and redundant features. In order to solve these feature selection problems, a new Two-stage Hybrid Feature Selection Algorithm (Ts-HFSA) is proposed. The first stage uses the Filter method combined with the Wrapper method to adaptively remove irrelevant features. In the second stage, a De-redundancy Algorithm of Fusing Approximate Markov Blanket with L1 Regular Term (DA 2 MBL1) is used to solve the AMB’s problem of information loss when deleting redundant features and potential redundancy in the subset of features obtained by AMB. The experimental results on multiple UCI public data sets and datasets from the material foundation of Chinese medicine showed that the Ts-HFSA better deleted irrelevant features and redundant features, found smaller and higher quality feature subsets, and improved stability, indicating that it offers more advantages than AMB, FCBF, RF, GBDT, XGBoost, Lasso, and CI_AMB. Moreover, in the face of data of the material foundation of Chinese medicine, with higher feature dimensions and fewer sample sizes, Ts-HFSA performed better, which can also improve the precision of the model after greatly reducing the dimension. The results indicated that Ts-HFSA is an effective method for feature selection of high-dimensional small samples and an excellent research method for the material foundation of Chinese medicine.
更多
查看译文
关键词
Feature selection,High-dimensional small sample,Approximate Markov blanket,Material foundation of Chinese medicine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要