Identification of multiple RNAs using feature fusion

BRIEFINGS IN BIOINFORMATICS(2021)

引用 4|浏览2
暂无评分
摘要
Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs.
更多
查看译文
关键词
coding RNA, feature fusion, local binary patterns, machine learning, non-coding RNA, XGBoost
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要