Heterogeneous Gene Expression Cross-Evaluation of Robust Biomarkers Using Machine Learning Techniques Applied to Lung Cancer

CURRENT BIOINFORMATICS(2022)

引用 2|浏览14
暂无评分
摘要
Background: Nowadays, gene expression analysis is one of the most promising pillars for understanding and uncovering the mechanisms underlying the development and spread of cancer. In this sense, Next Generation Sequencing technologies, such as RNA-Seq, are currently leading the market due to their precision and cost. Nevertheless, there is still an enormous amount of non-analyzed data obtained from older technologies, such as Microarray, which could still be useful to extract relevant knowledge. Methods: Throughout this research, a complete machine learning methodology to cross-evaluate the compatibility between both RNA-Seq and Microarray sequencing technologies is described and implemented. In order to show a real application of the designed pipeline, a lung cancer case study is addressed by considering two detected subtypes: adenocarcinoma and squamous cell carcinoma. Transcriptomic datasets considered for our study have been obtained from the public repositories NCBI/GEO, ArrayExpress and GDC-Portal. From them, several gene experiments have been carried out with the aim of finding gene signatures for these lung cancer subtypes, linked to both transcriptomic technologies. With these DEGs selected, intelligent predictive models capable of classifying new samples belonging to these cancer subtypes have been developed. Results: The predictive models built using one technology are capable of discerning samples from a different technology. The classification results are evaluated in terms of accuracy, F1-score and ROC curves along with AUC. Finally, the biological information of the gene sets obtained and their relationship with lung cancer are reviewed, encountering strong biological evidence linking them to the disease. Conclusion: Our method has the capability of finding strong gene signatures which are also independent of the transcriptomic technology used to develop the analysis. In addition, our article highlights the potential of using heterogeneous transcriptomic data to increase the amount of samples for the studies, increasing the statistical significance of the results.
更多
查看译文
关键词
Lung cancer, microarray, RNA-Seq, gene expression, machine learning, feature selection, CDSS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要