Machine Learning to Predict Continuous Protein Properties from Simple Binary Sorting and Deep Sequencing Data

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览2
暂无评分
摘要
Abstract Proteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, from catalyzing reactions and recognizing pathogens to forming dynamic cellular structure. The ability to evolve proteins rapidly and inexpensively towards improved properties is a common objective for protein engineers. Powerful high-throughput methods like fluorescent activated cell sorting (FACS) and next-generation sequencing (NGS) have dramatically improved directed evolution experiments. However, it is unclear how to best leverage this data to characterize protein fitness landscapes more completely and identify lead candidates. In this work, we develop a simple yet powerful framework to improve protein optimization by predicting continuous protein properties from simple directed evolution experiments using interpretable machine learning. Evaluated across five diverse protein engineering tasks, continuous properties are consistently predicted from readily available deep sequencing data. To prospectively test the utility of this approach, we generated a library of stapled peptides and applied the framework to predict and optimize both affinity and specificity. We coupled integer linear programming with the interpretable machine learning model coefficients to identify new variants from experimentally unseen sequence space that have desired properties. This approach represents a versatile tool for improved analysis and identification of protein variants across many domains of protein engineering.
更多
查看译文
关键词
continuous protein properties,simple binary sorting,machine learning,data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要