Optimization and machine learning applications to protein sequence and structure

Optimization and machine learning applications to protein sequence and structure(2013)

引用 23|浏览16
暂无评分
摘要
Algorithms that enable the development of drugs to inhibit or enhance protein func- tions save time, money and effort spent on bench research. This dissertation presents algorithms for protein structure prediction, and for the prediction of residues that form protein-protein interactions. Within the context of protein structure prediction, we present algorithms for sequence alignment, for the optimization of fragments into com- plete structures, and for the assessment of predicted structure quality. We demonstrate the utility of incorporating multiple objectives when aligning pairs of protein sequence profiles. We present a proof that the problem of generating Pareto optimal pairwise alignments has the optimal substructure property, and we present an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. Despite the efficiency of our exact algorithm, for certain pairs of sequences the com- putational cost remains high. To address this, we developed a heuristic approach to produce approximated Pareto optimal frontiers of pairwise alignments. The frontiers our algorithm produces contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments that are 6% better than the align- ments obtained by single objectives. We have provided a theoretically sound way of combining multiple objectives when aligning pairs of sequences. Assembling fragments of known structures to form complete proteins is a key tech- nique for predicting the structures of novel protein folds. Several existing methods use stochastic optimization methods to assemble fragments. We examine deterministic al- gorithms for optimizing scoring functions in protein structure prediction. We present a technique that can overcome local minima, and determine the extent to which these minima affect the optimization. Experiments on a diverse set of proteins show that our algorithms consistently outperform existing approaches, producing results 6-20% better. Our work in fragment assembly optimization has enabled the development of better protein structure prediction algorithms. We also present methods that can automatically assess the quality of computation- ally predicted protein structures. We examine techniques to estimate the quality of a predicted protein structure based on prediction consensus. The structure being assessed is aligned to different predictions for the same protein using local-global alignment. On two datasets, we examine both static and machine learning methods for aggregating dis- tances between residues within these alignments. We find that a constrained regression approach shows performance improvements of over 20%, and can be easily retrained to accommodate changing predictors. Our algorithm for model quality assessment enables the effective use of multiple structure prediction techniques. With respect to predicting interacting residues, we present a method that uses high quality sequence alignments to identify protein residues that bind to other proteins. In contrast to existing approaches, which focus on local sequence information, our method uses global sequence information from induced multiple sequence alignments to make better predictions. On a large and challenging dataset, our method achieves an area under the receiver-operating characteristic (ROC) of 0.656, compared to 0.615 achieved by the existing ISIS technique. By leveraging a priori measures of alignment quality, we can further increase performance to 0.768 ROC on a subset of the data. Our algorithms have allowed for better manipulation of protein function.
更多
查看译文
关键词
Pareto optimal frontier,complete protein,protein structure prediction,protein function,protein sequence profile,protein residue,protein func,better protein structure prediction,protein structure,novel protein fold
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要