DeepCGP: A Deep Learning Method to Compress Genome-Wide Polymorphisms for Predicting Phenotype of Rice

IEEE/ACM Transactions on Computational Biology and Bioinformatics(2023)

引用 0|浏览8
暂无评分
摘要
Genomic selection (GS) is expected to accelerate plant and animal breeding. During the last decade, genome-wide polymorphism data have increased, which has raised concerns about storage cost and computational time. Several individual studies have attempted to compress the genome data and predict phenotypes. However, compression models lack adequate quality of data after compression, and prediction models are time consuming and use original data to predict the phenotype. Therefore, a combined application of compression and genomic prediction modeling using deep learning could resolve these limitations. A Deep Learning C ompression-based G enomic P rediction (DeepCGP) model that can compress genome-wide polymorphism data and predict phenotypes of a target trait from compressed information was proposed. The DeepCGP model contained two parts: (i) an autoencoder model based on deep neural networks to compress genome-wide polymorphism data, and (ii) regression models based on random forests (RF), genomic best linear unbiased prediction (GBLUP), and Bayesian variable selection (BayesB) to predict phenotypes from compressed information. Two datasets with genome-wide marker genotypes and target trait phenotypes in rice were applied. The DeepCGP model obtained up to 99% prediction accuracy to the maximum for a trait after 98% compression. BayesB required extensive computational time among the three methods, and showed the highest accuracy; however, BayesB could only be used with compressed data. Overall, DeepCGP outperformed state-of-the-art methods in terms of both compression and prediction. Our code and data are available at https://github.com/tanzilamohita/DeepCGP .
更多
查看译文
关键词
Deep learning,autoencoder,genomic selection,data compression,genomic prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要