BATTER: Accurate Prediction of Rho-dependent and Rho-independent Transcription Terminators in Metagenomes

biorxiv(2023)

引用 0|浏览6
暂无评分
摘要
Transcription terminators mark the 3' ends of both coding and noncoding transcripts in bacteria and play crucial roles in gene regulations (such as controlling the stoichiometry of gene expression and conditionally switching off gene expression by inducing premature termination). Recently developed experimental 3' end mapping techniques greatly improved the current understanding of bacteria transcription termination, but these methods cannot detect transcripts that are unexpressed in the limited experimental conditions and cannot utilize the vast amount of information embedded in the rapidly growing metagenome data. Computational approaches can relieve these problems, but the development of such in-silico methods lags behind the experimental techniques. Previous computational tools are limited to predicting rho-independent terminators (RITs) and are primarily optimized for a few model species. The prediction of rho-dependent terminators (RDTs) which lack obvious consensus sequence patterns, and terminators in diverse non-model bacteria species still presents significant challenges. To address these challenges, we introduce BATTER (BActeria Transcript Three prime End Recognizer), a computational tool for predicting both RITs and RDTs in diverse bacteria species that allows metagenome-scale scanning. We developed a data augmentation pipeline by leveraging available high throughput 3' end mapping data in 17 bacteria species, and a large collection of 42,905 species-level representative bacteria genomes. Taking advantage context sensitive natural language processing techniques, we trained a BERT-CRF model, using both local features and context information for tagging terminators in genomic sequences. Systematic evaluations demonstrated our model's superiority: at a false positive rate of 0.1/kilobase, BATTER achieves a sensitivity of 0.924 for predicting E. coli RDTs; and a sensitivity of 0.756 for predicting terminators on term-seq dataset of oral microbiome, outperforming the best existing tool by 0.153. Based on BATTER's predictions, we systematically analyzed the clade-specific properties of bacteria terminators. The practical utility of BATTER was exemplified through two case studies: identifying functional transcripts from metatranscriptome data and discovering candidate noncoding RNAs related to antimicrobial resistance. As far as we know, BATTER is the first tool simultaneously predicting RITs and RDTs in diverse bacteria species. BATTER is available at https://github.com/lulab/BATTER. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
rho-dependent,rho-independent
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要