Statistical Modelling of Population-Level Exonic Variant Frequency Data with an Emphasis on Rare Variants

Yining Shi,Shelley B. Bull

University of Toronto journal of public health(2021)

引用 0|浏览0
暂无评分
摘要
Introduction & Objective: Rare variants with allele frequency smaller than 1% are postulated to be associated with disease susceptibility. Since allele frequencies vary globally, the use of population control data that does not match the study population can produce bias. The research question is to identify factors that explain variation in allele frequency across populations. The secondary question is to evaluate the potential bias in using population as control data when studying variants. We use data from gnomAD (Genome Aggregation Database) to answer these questions. Methods: We apply each of three model formulations: Linear, Logistic, and Poisson to explain how the frequency or count of variants depends on population subgroup/ancestry, functional annotation, sex, and disease status. We also evaluate interactions between population subgroups and functional annotation. Results: For very rare variants (allele frequency < 0.1%), likelihood ratio testing (LRT) provides evidence that allele frequencies vary with functional annotation and population in all three model formulations. By LRT, interactions of population and functional annotation are significant in the Logistic model and the Poisson model. The goodness-of-fit statistics show a better fit in the linear model compared to low frequency variants. Conclusion: We observe that population & functional annotation affect variant frequencies, and conclude that detection of differences across populations and annotations is model scale-dependent, especially for different degrees of rareness. Therefore, statisticians need to carefully consider the potential for bias when using gnomAD as control data. Moreover, gnomAD is a great resource for studies dealing with rare variants.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要