Var3PPred: variant prediction based on 3-D structure and sequence analyses of protein-protein interactions on autoinflammatory diseases

PeerJ(2024)

引用 0|浏览3
暂无评分
摘要
We developed a pathogenicity classifier, named Var3PPred, for identifying pathogenic variants in genes associated with autoinflammatory disorders. Our comprehensive approach integrates protein-protein interaction analysis and 3D structural information. Initially, we collected a dataset of 702 missense disease-associated variants from 35 genes linked to systemic autoinflammatory diseases (SAIDs). This dataset, sourced from the Infevers database, served as the training and test sets. We used the SMOTE algorithm to balance the dataset comprising 130 benign and 572 pathogenic variations. Our approach included 3D docking analysis of protein-protein interactions, utilizing data from the STRING and Intact databases. We weighted ZDOCK and SPRINT values in accordance with HGPEC gene rank scores for robustness. Additionally, we integrated sequential and structural features, such as changes in folding free energies (ΔΔ G), accessible surface area, volume, per residue local distance difference test (pLDDT) scores, and position specific independent count (PSIC) scores. These features, calculated using PyRosetta and AF2 computed structures, provided insights into amino acid conservation at variant positions and the impact of variants on protein structure and stability. Through extensive hyperparameter tuning of six machine learning algorithms, we found the random forest classifier to be the most effective, yielding an AUROC of 99% on the test set. Var3PPred outperformed three other classifiers, SIFT, PolyPhen, and CADD, on an unseen test set of a SAID-related gene. This demonstrates its capacity for pathogenicity classification of SAID variations. The source code for Var3PPred and the predictions for all 420 missense variants of uncertain significance from the Infevers database are available on GitHub: (https://github.com/alperbulbul1/Var3PPred).
更多
查看译文
关键词
Autoinflammatory,Variant prediction,Machine learning,Protein protein interaction,3-D structure,Pathogenicity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要