Expansion of novel biosynthetic gene clusters from diverse environments using SanntiS

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览8
暂无评分
摘要
Natural products (bio)synthesised by microbes are an important component of the pharmacopeia with a vast array of biomedical applications, in addition to their key role in many ecological interactions. One approach for the discovery of these metabolites is the identification of biosynthetic gene clusters (BGCs), genomic units which encode the molecular machinery required for producing the natural product. Genome mining has revolutionised the discovery of BGCs, yet metagenomic assemblies represent a largely untapped source of natural products. The imbalanced distribution of BGC classes in existing databases restricts the generalisation of detection patterns and limits the ability of mining methods to recognise a broader spectrum of BGCs. This problem is further intensified in metagenomic datasets, where BGC genes may be split across multiple contigs. This work presents SanntiS, a new machine learning-based approach for identifying BGCs. SanntiS achieved high precision and recall in both genomic and metagenomic datasets, effectively capturing a broad range of BGCs. Application of SanntiS to metagenomic assemblies found in MGnify led to a resource containing 1.1 million BGC predictions with associated contextual data from diverse biomes. Additionally, experimental validation of a previously undescribed BGC, detected solely by SanntiS, further demonstrates the potential of this approach in uncovering novel bioactive compounds. The study illustrates the significance of metagenomic datasets in comprehensively understanding the diversity and distribution of BGCs in microbial communities. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
novel biosynthetic gene clusters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要