ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning

bioRxiv(2019)

引用 4|浏览0
暂无评分
摘要
Long-read sequencing technologies allow the systematic interrogation of transcriptomes from any species. However, functional characterization requires the determination of the correct orientation of reads. Oxford Nanopore Technologies (ONT) allows the direct measurement of RNA molecules in the native orientation, but sequencing of complementary-DNA (cDNA) libraries generally yields a larger number of reads. Although strand-specific adapters can be used, error rates hinder their detection. Current methods rely on the comparison to a genome reference or on the use of additional technologies, which limits the applicability of rapid and cost-effective long-read sequencing for transcriptomics beyond model species. To facilitate the de-novo interrogation of transcriptomes in species or samples for which a genome reference is not available, we have developed ReorientExpress, a new tool based on deep learning to perform reference-free orientation of ONT reads from a cDNA library. Using as training transcriptome annotations, ReorientExpress predicted correctly the orientation of 84% of ONT cDNA reads in human, and 93% in S. cerevisiae. Furthermore, testing in human a model trained with mouse annotations, or testing in S. cerevisiae a model trained with C. glabrata, produced similar accuracy. Finally, in combination with long-read clustering, ReorientExpress established the right orientation for the majority of reads (92% in human, 97% in S. cerevisiae). ReorientExpress facilitates the interpretation of transcriptomes from long-read cDNA sequencing data without the need of a genome reference or the use of additional technologies. ReorientExpress is available at https://github.com/comprna/reorientexpress.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要