Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing

STATISTICS IN BIOSCIENCES(2021)

引用 5|浏览8
暂无评分
摘要
Single-cell RNA sequencing (scRNA-seq) quantifies RNA transcripts at individual cell level, providing cellular-level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species’ genome, which are of very high dimension and contain excessive zero counts. In order to better reduce the data dimension and extract robust and interpretable biological information, we develop a penalized Latent Dirichlet Allocation (pLDA) model for scRNA-seq data. The method is adapted from the generative probabilistic model LDA originated in natural language processing. pLDA models the scRNA-seq data by considering genes as words, cells as documents, and latent biological functions as topics. It imposes a penalty to reflect the characteristics in scRNA-seq that only a small subset of genes are expected to be topic-specific, which increases the robustness of the estimation and interpretability of the results. We apply pLDA to scRNA-seq datasets from both Drop-seq and SMARTer v1 technologies, and demonstrate improved performances in cell-type classification. The topics identified by pLDA are interpretable with biological functions.
更多
查看译文
关键词
Single-cell RNA sequencing,Latent Dirichlet Allocation,Topic models,Genomics,Transcriptomics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要