A machine learning model for disease risk prediction by integrating genetic and non-genetic factors

biorxiv(2022)

引用 0|浏览7
暂无评分
摘要
Polygenic risk score (PRS) has been widely used to identify the high-risk individuals from the general population, which would be helpful for disease prevention and early treatment. Many methods have been developed to calculate PRS by weighted aggregating the phenotype-associated risk alleles from genome-wide association studies. However, only considering genetic effects may not be sufficient for risk prediction because the disease risk is not only related to genetic factors but also non-genetic factors, e.g., diet, physical exercise et al. But it is still a challenge to integrate these genetic and non-genetic factors into a unified machine learning framework for disease risk prediction. In this paper, we proposed PRSIMD (PRS Integrating Multi-source Data), a machine learning model that applies posterior regularization to integrate genetic and non-genetic factors to improve disease risk prediction. Also, we applied Mendelian Randomization analysis to identify the causal non-genetic risk factors for the selected diseases. We applied PRSIMD to predict type 2 diabetes and coronary artery disease from UK Biobank and observed that PRSIMD was significantly better than the methods to calculate PRS including p -value threshold (P+T), PRSice2, SBLUP, DMSLMM, and LDpred2. In addition, we observed that PRSIMD achieved the better predictive power than the composite risk score. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
multi-source data,polygenic risk score,posterior regularization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要