Genomic prediction in plants: opportunities for machine learning-based approaches

Muhammad Farooq, Aalt D.J. van Dijk,Harm Nijveen,Shahid Mansoor, Dick de Ridder

Research Square (Research Square)(2022)

引用 0|浏览0
暂无评分
摘要
Abstract Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results show that ML methods are a better choice for nonlinear phenotypes and still comparable to Bayesian methods for linear phenotypes in the case of large effect QTNs. Furthermore, we find that ML methods are susceptible to confounding due to population structure and less sensitive to low linkage disequilibrium than linear parametric methods. Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.
更多
查看译文
关键词
genomic prediction,plants,learning-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要