Protein Embeddings Predict Binding Residues in Disordered Regions

biorxiv(2024)

引用 0|浏览1
暂无评分
摘要
The identification of protein binding residues helps to understand their biological processes as protein function is often defined through ligand binding, such as to other proteins, small molecules, ions, or nucleotides. Today's methods predicting binding residues often err for intrinsically disordered proteins or regions (IDPs/IDPRs). Here, we presented a novel machine learning (ML) model trained to predict binding regions specifically in IDPRs. The proposed model, IDBindT5, leveraged embeddings from the protein language model (pLM) ProtT5 to reach a balanced accuracy of 57.2 ± 3.6% (95% confidence interval). This was numerically slightly higher than the performance of the state-of-the-art (SOTA) methods ANCHOR2 (52.4 ± 2.7%) and DeepDISOBind (56.9 ± 5.6%) that rely on expert-crafted features and/or evolutionary information from multiple sequence alignments (MSAs). IDBindT5's SOTA predictions are much faster than other methods, easily enabling full-proteome analyses. Our findings emphasize the potential of pLMs as a promising approach for exploring and predicting features of disordered proteins. The model and a comprehensive manual are publicly available at https://github.com/jahnl/binding\_in\_disorder. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要