Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation

PLOS COMPUTATIONAL BIOLOGY(2024)

引用 0|浏览2
暂无评分
摘要
The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas. The need for new prognostic biomarkers is one of the main topics of interest in CRC (Colorectal Cancer). A potential strategy to address this issue, which to the best of our knowledge has not been attempted so far, is the combination of different public proteomic studies generated from independent patient cohorts. Despite the abundance of available proteomics data, meta-analyses have only been conducted at the genomic and transcriptomic levels so far. In this study, we reanalyzed 12 mass spectrometry-based public proteomics datasets. In total, the combined dataset included 440 samples from 299 different patients, encompassing both solid and liquid biopsies. Consequently, we defined a proteomics landscape suitable for assessing protein expression in tumors and normal mucosa, its association with patient outcome, and its potential detection in liquid biopsies. Furthermore, as a proof-of-concept for the data reuse strategy, we demonstrated its capacity to validate an experimentally-based SEC6 gene signature at the protein level and to identify new blood-detectable biomarkers. The data generated in this study can be accessed by anyone since all the data have been made available in the Expression Atlas resource.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要