INfrastructure for a PHAge REference Database: Identification of large-scale biases in the current collection of phage genomes

bioRxiv(2021)

引用 72|浏览5
暂无评分
摘要
Background With advances in sequencing technology and decreasing costs, the number of bacteriophage genomes that have been sequenced has increased markedly in the last decade. Materials and Methods We developed an automated retrieval and analysis system for bacteriophage genomes, INPHARED (https://github.com/RyanCook94/inphared), that provides data in a consistent format. Results As of January 2021, 14,244 complete phage genomes have been sequenced. The data set is dominated by phages that infect a small number of bacterial genera, with 75% of phages isolated only on 30 bacterial genera. There is further bias with significantly more lytic phage genomes than temperate within the database, resulting in ~54% of temperate phage genomes originating from just three host genera. Within phage genomes, putative antibiotic resistance genes were found in higher frequencies in temperate phages than lytic phages. Conclusion We provide a mechanism to reproducibly extract complete phage genomes and highlight some of the biases within this data, that underpins our current understanding of phage genomes.
更多
查看译文
关键词
phage genomes,phage reference database,large-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要