Effect of error and missing data on population structure inference using microsatellite data

bioRxiv(2016)

引用 9|浏览2
暂无评分
摘要
Missing data and genotyping errors are common in microsatellite data sets. We used simulated data to quantify the effect of these data aberrations on the accuracy of population structure inference. Data sets with complex, randomly generated, population histories were simulated under the coalescent. Models describing the characteristic patterns of missing data and genotyping error in real microsatellite data sets were used to modify the simulated data sets. Accuracy of ordination, tree based, and model based methods of inference was evaluated before and after data set modifications. The ability to recover correct population clusters decreased as missing data increased. The rate of decrease was similar among analytical procedures, thus no single analytical approach was preferable. For every 1% of a data matrix that contained missing genotypes, 2-4% fewer correct clusters were found. For every 1% of a matrix that contained erroneous genotypes, 1-2% fewer correct clusters were found using ordination and tree based methods. Model-based procedures that minimize the deviation from Hardy-Weinberg equilibrium in order to assign individuals to clusters performed better as genotyping error increased. We attribute this surprising result to the inbreeding-like nature of microsatellite genotyping error, wherein heterozygous genotypes are mischaracterized as homozygous. We show that genotyping error elevates estimates of the level of genetic admixture. Overall, missing data negatively impact population structure inference more than typical genotyping errors.
更多
查看译文
关键词
Bayesian inference,coalescent simulation,Hardy-Weinberg equilibrium,multilocus genotype,neighbor-joining,principal coordinate analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要