PCSPred_SC: Prediction of Protein Citrullination Sites Using an Effective Sequence-Based Combined Method

IEEE ACCESS(2020)

引用 2|浏览5
暂无评分
摘要
As one of post-translational modifications (PTMs), protein citrullination is crucial in a diverse array of cellular processes and implicated in a slew of human pathology. Therefore, accurate identification of protein citrullination sites (PCSs) is urgently needed to illuminate the reaction details and the complex pathogenesis related to the protein citrullination. In view of the limitations of the existing PCS predictors, this study proposes a novel and powerful sequence-based combined method named PCSPred_SC to further enhance the prediction performance. Various feature extraction methods are developed to mine sequence-derived biological information. Under the feature space, the predictive capabilities of different prediction algorithms, over-sampling methods, and feature selection methods are respectively explored. Experimental results indicate that the over-sampling methods are effective to solve the imbalanced dataset problem and the feature selection methods are significant in removing irrelevant and redundant features. On the same dataset using 10-fold cross validation, PCSPred_SC constructed by the combination of support vector machine (SVM), Adasyn, and t-distributed stochastic neighbor embedding (t-SNE) achieves much more outstanding performance than the competing methods, while reducing the number of features used for this task remarkably. It is anticipated that the proposed method will provide significant information to broaden our knowledge of citrullination-related biological processes.
更多
查看译文
关键词
Citrullination, prediction algorithm, over-sampling, feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要