A Community Effort To Identify And Correct Mislabeled Samples In Proteogenomic Studies

PATTERNS(2021)

引用 5|浏览13
暂无评分
摘要
Sample mislabeling or misannotation hasbeen a long-standing problemin scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations inmulti-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled SampleMislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teamsandthe challenge organizers has createdanopen-source software, COSMO, withdemonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.
更多
查看译文
关键词
DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要