EpiSemble: A Novel Ensemble-based Machine-learning Framework for Prediction of DNA N6-methyladenine Sites Using Hybrid Features Selection Approach for Crops

Current Bioinformatics(2023)

引用 0|浏览2
暂无评分
摘要
Aim The study aimed to develop a robust and more precise 6mA methylation prediction tool that assists researchers in studying the epigenetic behaviour of crop plants.Background N6-methyladenine (6mA) is one of the predominant epigenetic modifications involved in a variety of biological processes in all three kingdoms of life. While in vitro approaches are more precise in detecting epigenetic alterations, they are resource-intensive and time-consuming. Artificial intelligence-based in silico methods have helped overcome these bottlenecks.Methods A novel machine learning framework was developed through the incorporation of four techniques: ensemble machine learning, hybrid approach for feature selection, the addition of features, such as Average Mutual Information Profile (AMIP), and bootstrap samples. In this study, four different feature sets, namely di-nucleotide frequency, GC content, AMIP, and nucleotide chemical properties were chosen for the vectorization of DNA sequences. Nine machine learning models, including support vector machine, random forest, k-nearest neighbor, artificial neural network, multiple logistic regression, decision tree, naive Bayes, AdaBoost, and gradient boosting were employed using relevant features extracted through the feature selection module. The top three best-performing models were selected and a robust ensemble model was developed to predict sequences with 6mA sites.Results EpiSemble, a novel ensemble model was developed for the prediction of 6mA methylation sites. Using the new model, an improvement in accuracy of 7.0%, 3.74%, and 6.65% was achieved over existing models for RiceChen, RiceLv, and Arabidopsis datasets, respectively. An R package, EpiSemble, based on the new model was developed and made available at https://cran.r-project.org/web/packages/EpiSemble/index.html.Conclusion The EpiSemble model added AMIP as a novel feature, integrated feature selection modules, bootstrapping of samples, and ensemble technique to achieve an improved output for accurate prediction of 6mA sites in plants. To our knowledge, this is the first R package developed for predicting epigenetic sites of genomes in crop plants, which is expected to help plant researchers in their future explorations.
更多
查看译文
关键词
hybrid features selection approach,machine-learning machine-learning,ensemble-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要