Generating bulk RNA-Seq gene expression data based on generative deep learning models and utilizing it for data augmentation

Yinglun Wang, Qiurui Chen,Hongwei Shao,Rongxin Zhang,Han Shen

COMPUTERS IN BIOLOGY AND MEDICINE(2024)

引用 0|浏览7
暂无评分
摘要
Large-scale high-throughput transcriptome sequencing data holds significant value in biomedical research. However, practical challenges such as difficulty in sample acquisition often limit the availability of large sample sizes, leading to decreased reliability of the analysis results. In practice, generative deep learning models, such as Generative Adversarial Networks (GANs) and Diffusion Models (DMs), have been proven to generate realistic data and may be used to solve this promblem. In this study, we utilized bulk RNA-Seq gene expression data to construct different generative models with two data preprocessing methods: Min-Max-GAN, Z-Score-GAN, Min -Max-DM, and Z-Score-DM. We demonstrated that the generated data from the Min-Max-GAN model exhibited high similarity to real data, surpassing the performance of the other models significantly. Furthermore, we trained the models on the largest dataset available to date, achieving MMD (Maximum Mean Discrepancy) of 0.030 and 0.033 on the training and independent datasets, respectively. Through SHAP (SHapley Additive exPlanations) explanations of our generative model, we also enhanced our model's credibility. Finally, we applied the generated data to data augmentation and observed a significant improvement in the performance of classification models. In summary, this study establishes a GAN-based approach for generating bulk RNA-Seq gene expression data, which contributes to enhancing the performance and reliability of downstream tasks in high-throughput transcriptome analysis.
更多
查看译文
关键词
Machine learning,Deep learning,Generative learning,Transcriptome
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要