Interrogating Proteome-wide Cysteine Ligandabilities: Crystallography Meets Chemoproteomics Through Machine Learning.

bioRxiv : the preprint server for biology(2023)

引用 0|浏览2
暂无评分
摘要
In the recent decade, targeted covalent inhibition (TCI) has become mainstream in drug discovery and an increasingly large number of cysteine-liganded X-ray structures have been deposited in the protein data bank (PDB). At the same time, a chemoproteomic technique called activity-based protein profiling (ABPP) has ushered in the efforts to map covalently ligandable sites in the entire proteome. Here we asked if the current PDB information is sufficient for developing highly predictive machine-learning (ML) models, and what such models can inform us about the divergence between the cysteine ligandabilities captured by crystallography and those determined by ABPP in cells. The tree-based and convolutional neural network (CNN) models were developed, trained on an exhaustively curated database (LigCys3D) containing over 1,000 liganded cysteines in nearly 800 proteins represented by over 10,000 X-ray structures. In the unseen tests, the tree models and CNNs gave the AUCs of about 94%; however, in the evaluation of a nonoverlapping ABPP dataset, the models gave significantly lower AUCs, especially when AlphaFold2 models were used. Our analysis suggests factors giving rise to the divergence and ways to improve the model transferability. Developing ML models as a surrogate of crystallography may further unleash the power of chemoproteomics. Our work represents a first step in the ML-led integration of big genome data, structure models, and chemoproteomic experiments to annotate the human proteome space for the next-generation drug discoveries.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要