An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development

biorxiv(2020)

引用 12|浏览18
暂无评分
摘要
Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and between instruments, preparation methods, and analytical pipelines, across various sequencing depths. We discuss the relevance of this variability to downstream analyses, and strategies to reduce variability. ### Competing Interest Statement All authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. This study was funded by Google LLC.
更多
查看译文
关键词
extensive sequence dataset,benchmarking,gold-standard
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要