Using Machine Learning To Detect Heterogeneity In Single Cell Multi-Omics Datasets

CANCER RESEARCH(2020)

引用 0|浏览14
暂无评分
摘要
Background: While scRNA-seq helps in obtaining high-resolution views of single-cell heterogeneity through characterization of the functional state of cells, our understanding of the cellular properties and population architectures of heterogeneous tissues will be greatly advanced by the multi-omics investigation of single cells. Recently, several computational methods have been developed to integrate data from multiple different single cell experiments measuring individual analytes. But unambiguous inference that a cellular phenotype is caused by a genotype can only be achieved by their measurement from the same single cell. To address this gap, we have developed the Tapestri multi-omics workflow to analyze the RNA and DNA information from the same cell. Methods: After pre-processing of the reads, cell calling is done by identifying the good barcodes using both DNA and RNA reads. DNA reads are processed to identify genetic variants in each cell. The variant cell matrix is then filtered for data completeness to ensure only high-quality data is used in downstream processing. Ploidy is estimated using by normalizing DNA reads, genetic variants and ploidy information together is used to identify subclones. The RNA reads are log normalized and scaled within each cell. Next we set the mean expression of each transcript to 0 and scale the variance to 1. This avoids downstream analysis being skewed by high expressors. We then trained a random forest classifier to identify significantly differentially expressed transcripts across subclones which were identified using genetic variants and ploidy information. Results: Using the top differentially expressed transcripts we performed dimensionality reduction followed by clustering of cell types. The resulting visualization showed how well the genotypic and transcriptomic datasets integrated with one another. We tested this method on a model system with Raji and KG1 cell lines titrated at 50:50 ratio and were clearly able to associate the transcriptional variation with the genotypic variation of the 2 different cell lines. The method was also validated on a PBMC sample to ensure robustness of methods. We were able to identify the different cell types present the sample and were able to overlay that information with genetic variants to identify sub-clones in the identified cell types. Citation Format: Saurabh Gulati, Shu Wang, Saurabh Parikh, Ben Liu, Kaustubh Gokhale, Manimozhi Manivannan, Sombeet Sahu, Dong Kim, Anup Parikh. Using machine learning to detect heterogeneity in single cell multi-omics datasets [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 865.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要