Enhancing polygenic prediction with an agnostic multi-pgs method that leverages hundreds of polygenic scores

European Neuropsychopharmacology(2023)

引用 0|浏览20
暂无评分
摘要
The prediction accuracy of a polygenic score (PGS) is highly determined by the size of the training sample. Although this sample is still limited for psychiatric disorders, these disorders are genetically correlated with multiple behavioral and physical phenotypes. These mostly quantitative phenotypes are much more accessible and thus currently have genome-wide association studies (GWAS) with millions of samples. Generating stand-alone PGS for publicly accessible GWAS summary statistics is nowadays possible with PGS methods that do not require a validation sample, like LDpred2-auto. There are some available methods that benefit from using genetically correlated phenotypes to increase prediction accuracy, including MTAG and wMT-SBLUP and that have been applied to psychiatric disorders. These methods require a pre-selection of the included phenotypes based on prior information about the genetic correlation estimates with the desired outcome. Here we show the results of a new method, multi-PGS, that does not require to pre-specify genetically correlated phenotypes but relies on an agnostic PGS library based on “all” publicly available GWAS summary statistics. We explore diverse applications of this multi-PGS for psychiatric disorders using the iPSYCH data. In practice, a large library of PGS including 937 scores was generated from publicly available GWAS summary statistics resources (GWAS Catalog, GWAS ATLAS, PGC) using LDpred2-auto. Then the PGS library together with covariates sex, birth year and 20 PCs were used as predictors in multivariate models. We used both penalized regression models (lasso) and gradient boosted trees (XGBoost). The out-of-sample prediction accuracy of the risk prediction models was assessed. First, we applied our multi-PGS strategy to predict ADHD, affective disorder, anorexia nervosa, autism, bipolar disorder and schizophrenia in iPSYCH. All multi-PGS models increased both R2 and logOR, with R2 increases of 4-fold on average and up to 9-fold for ADHD and autism. Increased prediction was also observed when compared to wMT-SBLUP. Interestingly, multiple PGS for the same phenotype were selected in the final model. For example, three different depression-related PGS (self-reported, medically diagnosed and broad depression) were included in the affective disorder multi-PGS. This indicates that non-overlapping signals from multiple GWAS of similar phenotypes can be combined to increase prediction accuracy. Next, we explored further the capacity of our multi-PGS to predict outcomes for which there are no available external GWAS summary statistics, as is the case for some sub-diagnoses and understudied psychiatric disorders. This question is inspired by a scenario where the studied outcome could benefit from PGS analyses, but there is still no GWAS for that outcome. Surprisingly, our results showed no decrease in prediction accuracy when the library did not include a PGS for the target disorder. Moreover, we applied generated multi-PGS for case-case predictions of highly comorbid disorders. For instance, a multi-PGS of ADHD vs. ASD explained 12% of the variance of the disjoint cases. Psychiatric disorders are very heterogeneous phenotypes, both genetically and etiologically. We exploit this feature to increase genetic prediction accuracy using multi-PGS constructed in an agnostic manner. Finally, we discuss the conflict prediction vs. explanation in the context of multi-PGS models.
更多
查看译文
关键词
polygenic prediction,multi-pgs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要