HiDDEN: A machine learning label refinement method for detection of disease-relevant populations in case-control single-cell transcriptomics

biorxiv(2023)

引用 0|浏览8
暂无评分
摘要
In case-control single-cell RNA-seq studies, sample-level labels are transferred onto individual cells, labeling all case cells as affected, but only a small fraction of them may actually be perturbed. Here, using simulations, we demonstrate that the standard approach to single cell analysis fails to isolate the subset of affected case cells and their markers when either the affected subset is small, or when the strength of the perturbation is mild. To address this fundamental limitation, we introduce HiDDEN, a computational method that refines the case-control labels to accurately reflect the perturbation status of each cell. We show that HiDDEN has superior ability to recover biological signals missed by the standard analysis workflow in simulated ground truth datasets of cell type mixtures. When applied to a dataset of human multiple myeloma precursor conditions, HiDDEN recapitulates the expert manual annotation and discovers malignancy in previously considered healthy early stage samples. When applied to a mouse model of demyelination, HiDDEN identifies an endothelial subpopulation playing a role in early stage blood-brain barrier dysfunction. We anticipate that HiDDEN should find a wide usage in contexts which require the detection of subtle changes in cell types across conditions. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
label refinement method,machine learning,disease-relevant,case-control,single-cell
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要