A hybrid positive unlabeled learning framework for uncovering scaffolds across human proteome by measuring the propensity to drive phase separation.

Briefings in bioinformatics(2023)

引用 0|浏览19
暂无评分
摘要
Scaffold proteins drive liquid-liquid phase separation (LLPS) to form biomolecular condensates and organize various biochemical reactions in cells. Dysregulation of scaffolds can lead to aberrant condensate assembly and various complex diseases. However, bioinformatics predictors dedicated to scaffolds are still lacking and their development suffers from an extreme imbalance between limited experimentally identified scaffolds and unlabeled candidates. Here, using the joint distribution of hybrid multimodal features, we implemented a positive unlabeled (PU) learning-based framework named PULPS that combined ProbTagging and penalty logistic regression (PLR) to profile the propensity of scaffolds. PULPS achieved the best AUC of 0.8353 and showed an area under the lift curve (AUL) of 0.8339 as an estimation of true performance. Upon reviewing recent experimentally verified scaffolds, we performed a partial recovery with 2.85% increase in AUL from 0.8339 to 0.8577. In comparison, PULPS showed a 45.7% improvement in AUL compared with PLR, whereas 8.2% superiority over other existing tools. Our study first proved that PU learning is more suitable for scaffold prediction and demonstrated the widespread existence of phase separation states. This profile also uncovered potential scaffolds that co-drive LLPS in the human proteome and generated candidates for further experiments. PULPS is free for academic research at http://pulps.zbiolab.cn.
更多
查看译文
关键词
area under the lift curve,hybrid multimodal features,liquid–liquid phase separation,positive unlabeled learning,scaffold protein
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要