GSCtool: A Novel Descriptor that Characterizes the Genome or Applying Machine Learning in Genomics

ADVANCED INTELLIGENT SYSTEMS(2023)

引用 0|浏览3
暂无评分
摘要
Machine learning (ML) is one of the core driving forces for the next breeding stage, and Breeding 4.0. Genotype matrix based on single-nucleotide polymorphisms (SNPs) is often used in ML for genome-to-phenotype prediction. Genotype matrix has an inherent defect, as the feature spaces it generates across different individuals or groups are inconsistent, and this hinders the application of ML. To overcome the challenge, a genome descriptor, Genic SNPs Composition Tool (GSCtool) is developed, which counts the number of SNPs in each gene of the genome so the dimension of the feature vectors equals the number of annotated genes in a species. Compared to using the genotype matrix, using GSCtool significantly decreases the model training time and has a higher accuracy of phenotype prediction. GSCtool also achieves good performance in variety identification, which is useful in crop variety protection. In general, GSCtool will help facilitate the application and study of genomic ML. The source code and test data of GSCtool are freely available at https://github.com/SZJhacker/GSCtool and https://gitee.com/shenzijie/GSCtool. Genic single-nucleotide polymorphisms Composition Tool (GSCtool) is a novel descriptor that characterizes the genome. For genome-to-phenotype prediction, GSCtool outperforms genotype matrices, reducing model training time and enhancing accuracy. And it fills a gap in computational methods for variety classification. GSCtool facilitates the application and study of genomic machine learning.image (c) 2023 WILEY-VCH GmbH
更多
查看译文
关键词
genome,novel descriptor,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要