xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
NeurIPS(2023)
摘要
Advances in high-throughput sequencing technology have led to significant
progress in measuring gene expressions at the single-cell level. The amount of
publicly available single-cell RNA-seq (scRNA-seq) data is already surpassing
50M records for humans with each record measuring 20,000 genes. This highlights
the need for unsupervised representation learning to fully ingest these data,
yet classical transformer architectures are prohibitive to train on such data
in terms of both computation and memory. To address this challenge, we propose
a novel asymmetric encoder-decoder transformer for scRNA-seq data, called
xTrimoGene^α (or xTrimoGene for short), which leverages the sparse
characteristic of the data to scale up the pre-training. This scalable design
of xTrimoGene reduces FLOPs by one to two orders of magnitude compared to
classical transformers while maintaining high accuracy, enabling us to train
the largest transformer models over the largest scRNA-seq dataset today. Our
experiments also show that the performance of xTrimoGene improves as we scale
up the model sizes, and it also leads to SOTA performance over various
downstream tasks, such as cell type annotation, perturb-seq effect prediction,
and drug combination prediction. xTrimoGene model is now available for use as a
service via the following link: https://api.biomap.com/xTrimoGene/apply.
更多查看译文
关键词
scalable representation learner,single-cell,rna-seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要