A K mer Based Approach for Virus Classification Identifies Coronavirus Infections and Viral Associations in Human and Plant Microbiomes

Research Square(2020)

引用 0|浏览24
暂无评分
摘要
Background Viruses are underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken (a parallelized version of Kraken), a metagenomic/transcriptomic classifier. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, especially samples without virus-induced pathophysiology, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain tissue. In the Populus study, we identified genotype and compartment-specific viral signatures, while in the Autism study we identified a significant increase in abundance of eight viral sequences in post mortem brains. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we utilize the NCBI databases to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples to validate the potential usefulness of classifying viruses. Conclusion Viruses represent an essential component of the microbiome. The ability to classify viruses represents the compulsory first step in better understanding their role in the microbiome. Our viral classification method allows for a more complete identification of viral sequences than previous methods. This will improve identification of associations between viruses and their hosts as well as viruses and other microbiome members and can be used with any tool that utilizes a taxonomy for classification (such as Kraken).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要