Standardizing and applying a mating-based whole-genome simulation approach reveals caution in using chromosome-level PCA and kinship estimates

biorxiv(2023)

引用 0|浏览0
暂无评分
摘要
This paper presents a new and efficient method for simulating pseudo-genotype data using the standardized protocol of SLiM, which offers a flexible alternative to traditional methods that rely on large genetic datasets. These datasets can be time-consuming to obtain, especially when institutional review board (IRB) review is involved, making simulation an attractive alternative. While HapGen v2 is the most popular genotype simulator, we found that SLiM has the potential for more customizable simulation to meet multiple needs. To validate our new method, we compared its performance among parallel simulations varying multiple parameters. Our results showed that SLiM is capable of simulating samples up to 333 times the input size, with a low rate of simulated samples that are 2nd or closer relatives (REV), making it a promising alternative to HapGen. We also applied our whole-genome simulation approach to sensitivity analyses of chromosome-level principal component analysis (PCA) and kinship estimation. Our findings revealed important insights into the sensitivity of PCA and kinship estimation, highlighting the unequal distribution of population structure across chromosomes and ancestries. Furthermore, our study provides experimental support for avoiding chromosome-level quality control statistics. Overall, our standardized protocol of SLiM offers a flexible new way to produce pseudo-genotype data, and our findings provide valuable insights that can advance research in the field. By demonstrating the potential of SLiM for more customizable simulations and highlighting the importance of considering the distribution of population structure across chromosomes and ancestries, our research has significant implications for the study of genetics and genomics. Author Summary In this publication, we introduce a novel approach to genotype simulation using a mating-based strategy in SLiM. Our approach mimics mitosis computationally and stands out as the only one available as of December 2022 that can maintain cross-chromosome associations during whole-genome level simulation, with no competitors in sight. Additionally, our approach is applicable to regional or chromosomal genotype simulation. When compared to the current gold-standard chromosome-level simulator, HapGen, our approach exhibits superior performance when generating large sample sizes (>13,000). We provide an application example that uses whole-genome simulation to underscore the importance of whole-genome quality control (QC) statistics, such as principal component analysis (PCA) and kinship estimates, compared to the chromosome-level ones. Results of the application indicate instability and bias in the chromosome-level QC statistics. Overall, our approach represents a valuable tool for genetics research that can assist in the evaluation and validation of genetic analyses, and people should avoid chromosome-level QC statistics. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
kinship,mating-based,whole-genome,chromosome-level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要