AGNOSTOS-DB: a resource to unlock the uncharted regions of the coding sequence space

biorxiv(2021)

引用 3|浏览12
暂无评分
摘要
Genomes and metagenomes contain a considerable percentage of genes of unknown function, which are often excluded from downstream analyses limiting our understanding of the studied biological systems. To address this challenge, we developed AGNOSTOS, a combined database-computational workflow resource that unifies the known and unknown coding sequence space of genomes and metagenomes. Here, we present AGNOSTOS-DB, an extensive database of high-quality gene clusters enriched with functional, ecological and phylogenetic information. Moreover, AGNOSTOS allows integrating new data into existing AGNOSTOS-DBs, maximizing the information retrievable for the genes of unknown function. As a proof of concept, we provide a seed database that integrates the predicted genes from marine and human metagenomes, as well as from Bacteria, Archaea, Eukarya and giant viruses environmental and cultivar genomes. The seed database comprises 6,572,081 gene clusters connecting 342 million genes and represents a comprehensive and scalable resource for the inclusion and exploration of the unknown fraction of genomes and metagenomes. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
coding,sequence,space,resource
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要