scBERT as a Large-scale Pretrained Deep Language Model for Cell Type Annotation of Single-cell RNA-seq Data

Fan Yang,Wenchuan Wang,Fang Wang,Yuan Fang,Duyu Tang,Junzhou Huang,Hui Lu,Jianhua Yao

bioRxiv (Cold Spring Harbor Laboratory)（2022）

引用 30|浏览36

暂无评分

摘要

Annotating cell types based on the single-cell RNA-seq data is a prerequisite for researches on disease progress and tumor microenvironment. Here we show existing annotation methods typically suffer from lack of curated marker gene lists, improper handling of batch effect, and difficulty in leveraging the latent gene-gene interaction information, impairing their generalization and robustness. We developed a pre-trained deep neural network-based model scBERT (single-cell Bidirectional Encoder Representations from Transformers) to overcome the challenges. Following BERT’s approach of pre-train and fine-tune, scBERT obtains a general understanding of gene-gene interaction by being pre-trained on huge amounts of unlabeled scRNA-seq data and is transferred to the cell type annotation task of unseen and user-specific scRNA-seq data for supervised fine-tuning. Extensive and rigorous benchmark studies validated the superior performance of scBERT on cell type annotation, novel cell type discovery, robustness to batch effect, and model interpretability. ### Competing Interest Statement The authors have declared no competing interest.

查看译文

关键词

Bioinformatics,Classification and taxonomy,Gene expression,Engineering,general

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要