G4-Attention: Deep Learning Model with Attention for predicting DNA G-Quadruplexes
arxiv(2024)
摘要
G-Quadruplexes are the four-stranded non-canonical nucleic acid secondary
structures, formed by the stacking arrangement of the guanine tetramers. They
are involved in a wide range of biological roles because of their exceptionally
unique and distinct structural characteristics. After the completion of the
human genome sequencing project, a lot of bioinformatic algorithms were
introduced to predict the active G4s regions in vitro based on the
canonical G4 sequence elements, G-richness, and G-skewness,
as well as the non-canonical sequence features. Recently, sequencing techniques
like G4-seq and G4-ChIP-seq were developed to map the G4s in vitro,
and in vivo respectively at a few hundred base resolution.
Subsequently, several machine learning approaches were developed for predicting
the G4 regions using the existing databases. However, their prediction models
were simplistic, and the prediction accuracy was notably poor. In response,
here, we propose a novel convolutional neural network with Bi-LSTM and
attention layers, named G4-attention, to predict the G4 forming sequences with
improved accuracy. G4-attention achieves high accuracy and attains
state-of-the-art results in the G4 prediction task. Our model also predicts the
G4 regions accurately in the highly class-imbalanced datasets. In addition, the
developed model trained on the human genome dataset can be applied to any
non-human genome DNA sequences to predict the G4 formation propensities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要