Inference of the distribution of fitness effects of mutations is affected by SNP filtering methods, sample size and population structure

Authorea (Authorea)(2023)

引用 0|浏览4
暂无评分
摘要
The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of SNPs and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods – downsampling, imputation and subsampling – with sample sizes of 4 ~ 100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000); and (3) population structure may skew the inferred DFE toward more strongly deleterious mutations. We suggest that future studies should consider downsampling for small datasets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.
更多
查看译文
关键词
mutations,snp filtering methods,fitness effects,population structure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要