Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression Values
arxiv(2023)
摘要
Recent advancements in image classification have demonstrated that
contrastive learning (CL) can aid in further learning tasks by acquiring good
feature representation from a limited number of data samples. In this paper, we
applied CL to tumor transcriptomes and clinical data to learn feature
representations in a low-dimensional space. We then utilized these learned
features to train a classifier to categorize tumors into a high- or low-risk
group of recurrence. Using data from The Cancer Genome Atlas (TCGA), we
demonstrated that CL can significantly improve classification accuracy.
Specifically, our CL-based classifiers achieved an area under the receiver
operating characteristic curve (AUC) greater than 0.8 for 14 types of cancer,
and an AUC greater than 0.9 for 2 types of cancer. We also developed CL-based
Cox (CLCox) models for predicting cancer prognosis. Our CLCox models trained
with the TCGA data outperformed existing methods significantly in predicting
the prognosis of 19 types of cancer under consideration. The performance of
CLCox models and CL-based classifiers trained with TCGA lung and prostate
cancer data were validated using the data from two independent cohorts. We also
show that the CLCox model trained with the whole transcriptome significantly
outperforms the Cox model trained with the 21 genes of Oncotype DX that is in
clinical use for breast cancer patients. CL-based classifiers and CLCox models
for 19 types of cancer are publicly available and can be used to predict cancer
prognosis using the RNA-seq transcriptome of an individual tumor. Python codes
for model training and testing are also publicly accessible, and can be applied
to train new CL-based models using gene expression data of tumors.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要