Explainable Transcription Factor Prediction with Protein Language Models.

Liyuan Gao, Kyler Shu,Jun Zhang,Victor S. Sheng

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2023)

引用 0|浏览2
暂无评分
摘要
Language models have exhibited remarkable performance across diverse tasks, including those in the realm of biological research such as protein language modeling. Transcription factors (TFs) are pivotal in gene regulation, influencing gene expression through specific DNA sequence binding. While various TF prediction techniques exist, they often necessitate extensive training datasets or suffer from limited accuracy. In this study, we propose an ESM-TFpredict model, which leverages a pre-trained protein language model to encode amino acid sequences, followed by 1-D convolutional neural networks for TF prediction. To elucidate the model’s decision-making, we employ an integrated gradients method to highlight the important features driving TF identification. Comparative experimental analysis with existing models, DeepTFactor and TFpredict, reveals that the ESM-TFpredict achieves an accuracy exceeding 95% across four evaluation metrics, surpassing both competitors. By utilizing a slide window approach for protein representation compression, the training duration of ESM-TFpredict is 315.78 seconds, which is only 51% of the training time required by DeepTFactor and a mere 12% of the training time required by TFpredict. We further analyze the contributions of known TF-related regions (average attribution score 0.9152) versus Non-TF-related regions (average attribution score 0.0848), demonstrating that the TF-related regions have dominant influences on TF prediction.
更多
查看译文
关键词
transcription factor,protein language model,integrated gradients,prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要