Ensemble Feature Selection: Are Stability Metrics a Proxy or a Complement to Predictive Performance?

Zahra Mungloo-Dilmohamud,Yasmina Jaufeerally-Fakim,Carlos A. Peña-Reyes

2021 13th International Conference on Bioinformatics and Biomedical Technology（2021）

引用 0|浏览6

暂无评分

摘要

Proper identification of biomarkers, used in the development of drugs, is critical as has been shown with the race to find a vaccine for the Covid19. Gene-expression based marker discovery often entails that feature selection be performed. However, a plethora of feature selection methods exist and they do not result in the selection of the same feature subsets for the same dataset. Often, users are faced with having to select which subset to use. To help in this conundrum, several approaches have been proposed to guide feature subset selection, among which the use of ensemble methods (i.e., combining subsets from multiple methods) has gained attention recently. In an ensemble approach there are two issues that deserve attention: the stability of the feature subsets being combined and the classification performance of the combined feature subsets. Hence the interest in exploring how stability and performance relate, which is the central topic investigated in this paper. First 5/6 different feature selection methods are used to create feature subsets for 3 different transcriptomics datasets. Then, the stability and performance of these feature subsets under a given merging strategy are computed using 5 stability metrics and 3 performance metrics for 3 different classifiers. Our results suggest that performance and stability criteria are complementary and conflicting and that both must be considered to decide on the final selected feature subsets. We use two reference metrics to illustrate such selection.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要