Predicting Thermophilic Proteins by Machine Learning

CURRENT BIOINFORMATICS(2020)

引用 114|浏览7
暂无评分
摘要
Background: Thermophilic proteins can maintain good activity under high temperature, therefore, it is important to study thermophilic proteins for the thermal stability of proteins. Objective: In order to solve the problem of low precision and low efficiency in predicting thermophilic proteins, a prediction method based on feature fusion and machine learning was proposed in this paper. Methods: For the selected thermophilic data sets, firstly, the thermophilic protein sequence was characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce the dimension of the expressed protein sequence features in order to reduce the training time and improve efficiency. Finally, the classification model was designed by using the classification algorithm. Results: A variety of classification algorithms was used to train and test on the selected thermophilic dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife method was over 92%. The combination of other evaluation indicators also proved that the SVM performance was the best. Conclusion: Because of choosing an effectively feature representation method and a robust classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to most reported methods.
更多
查看译文
关键词
Thermophilic proteins,feature fusion,g-gap,entropy density,autocorrelation coefficient,KPCA,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要