A Novel Hybrid Feature Selection And Ensemble Learning Framework For Unbalanced Cancer Data Diagnosis With Transcriptome And Functional Proteomic

IEEE ACCESS(2021)

引用 4|浏览14
暂无评分
摘要
The high dimension, high redundancy and class imbalance of cancer multiple omics data are the main challenges for cancer diagnosis. Existing studies have neglected the role of functional proteomics in the occurrence and development of cancer. In this study, a novel hybrid feature selection and ensemble learning framework, referred to as the three-stage feature selection and twice-competitional ensemble learning method (TSFS-TCEM), is proposed for cancer diagnosis. Firstly, we combine the transcriptome and functional proteomics data to construct a multi-omics data on breast cancer, which is the first time to apply these combined biological data for diagnosing breast cancer. Secondly, the proposed method introduces multiple models during the feature selection and diagnostic model construction. The three-stage feature selections integrate the features from different types of data and the twice-competitional ensemble learning framework resolves the data imbalance problem suffer from a single classifier. The TSFS-TCEM achieves a diagnostic accuracy of 99.64%, outperforming all compared methods. In addition, the 5-fold cross-validation sensitivity, specificity and F-Measure of the method are above 99.63%.
更多
查看译文
关键词
Cancer, Feature extraction, Proteomics, Support vector machines, Bagging, Breast cancer, Redundancy, Functional proteomic, transcriptome profiles, the Cancer Genome Atlas (TCGA), ensemble method, hybrid feature selection, cancer diagnosis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要