MR-DNA: Flexible 5mC-Methylation-Site Recognition in DNA Sequences using Token Classification

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览0
暂无评分
摘要
The process of DNA 5-methylcytosine modification has been widely studied in mammals and and plays an important role in epigenetics. Several computational approaches have been developed to aid the identification of methylation sites. In this study, we introduce a novel deep-learning framework MR-DNR that aims at predicting specific methylation sites located in gene promoter regions. The idea is to adapt the name-entity recognition approach to methylation-site prediction. MR-DNA is trained on a stacked model architecture that consists of a pre-trained MuLan-Mehtyl-DistilBERT language model and conditional random field algorithms. The resulting fine-tuned model achieves an accuracy of 95.4% on an independent test dataset. A key advantage of this formulation of the methylation-site identification task is that the input DNA sequence can be of any length, unlike previous methods that predict methylation state on short, fixed-length DNA sequences. For training and testing purposes, we provide a database of DNA sequences containing verified 5mC-methylation sites, obtained from eight human cell lines in ENCODE. Data and code are available at https://github.com/husonlab/MR-DNA . CCS Concepts Computing methodologies → Information extraction .
更多
查看译文
关键词
mr-dna sequences,mc-methylation-site
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要