Inferring Tumor Progression in Large Datasets

PLOS COMPUTATIONAL BIOLOGY(2020)

引用 8|浏览26
暂无评分
摘要
Author summary Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method. Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression.
更多
查看译文
关键词
tumor progression,datasets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要