Correctness of Cell Labels in Public Single Cell Transcriptomics Datasets.

BIBM(2021)

引用 0|浏览5
暂无评分
摘要
The number of single-cell transcriptomic (SCT) studies is rapidly increasing. More than 15000 single cell gene expression data sets are available in public repositories. More than 2400 of these sets involve Peripheral Blood Mononuclear Cells (PBMC) data sets. Main cell types of PBMC are B cells, dendritic cells, monocytes, natural killer cells, and T cells. Labels of individual PBMC are usually provided in metadata accompanying the data sets or are implicit as data set partitions for sorted cells. We analyzed the correctness of labels assigned to individual cells from PBMC in primary reports. The correctness of primary labels was assessed by using Artificial Neural Network (ANN) classifier and Confident Learning (CL) approach. We assessed that the number of mislabels on average in our data sets is about2%. The label accuracy varied broadly between data sets, particularly among those generated by experimental cell sorting followed by SCT.
更多
查看译文
关键词
ANN,confident learning,gene expression,PBMC,mislabels analysis,supervised machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要