Improved Decoy Selection via Machine Learning and Ranking

2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)(2018)

引用 2|浏览9
暂无评分
摘要
Selection of biologically-active/native tertiary structures among decoy structures computed by template -free protein structure prediction methods remains a challenging problem. The scoring/energy function that guides these methods is an unreliable indicator of nativeness. Currently, the most popular decoy selection methods rely on clustering decoys based on structural similarity but achieve varied success and are computationally expensive for large datasets. Recently, an alternative multi -model approach that utilizes the concepts of basins in the energy landscape housing the computed decoys has shown promise in overcoming some of the li mitations of clustering-based approaches. A separate paper shows that ranking based basin selection strategies outperform a standard clustering -based decoy selection method in terms of purity (percentage of true positives relative to size of the selected basins(s), penalizing the selected basin(s) by the extent of false positives found in that basin(s)). Despite these promising results, failing to perform consistently well over varied test cases remains a bottleneck to decoy selection. In this work, we propose a two-phase machine learning-based framework that utilizes characteristics of basins extracted from an energy landscape together with three knowledge-based potentials. In phase 1, we regress the basin features on purity and rank basins based on predicted purity. Regression is applied on decoy features in phase 2, and potentially non-native decoys are eliminated based on the predicted root-mean-squared deviation from the (unknown) native structure to further purify the basins selected in phase 1. Empirical investigation conducted on 18 benchmark proteins shows that the proposed machine learning -based framework offers better performance. The proposed method performs consistently well over varied test cases and opens the way to further research in this direction.
更多
查看译文
关键词
template-free protein structure prediction method,energy landscape,rank basin,purity basin,root-mean-squared deviation,knowledge-based potential,two-phase machine learning-based framework,decoy selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要