ExtendAlign: a computational algorithm for delivering multiple global alignment results originated from local alignments.

bioRxiv(2019)

引用 0|浏览18
暂无评分
摘要
MicroRNAs are conserved small RNAs that mediate gene silencing. Although the first plant and animal microRNAs reported in 1993 were considered an abundant class of RNAs, it is with the advent of high throughput sequencing that a tremendous amount of conserved and non-conserved microRNA families has been uncovered across several clades. The search for microRNAs has motivated the development of computational tools that incorporate a diverse set of parameters that aid in the recognition of novel candidate sequences, or the assignation into microRNA families by hierarchical clustering methods. In order to establish homology, multiple sequence alignment tools (MSATs) are used to determine the identity between highly similar sequences. However, as the similarity between two or more sequences declines, the MSATs that perform alignments locally have difficulties in reporting the correct number of matches and mismatches (m/mm) for entire short sequences, compromising the alignment and creating biased results for multiple novel microRNAs against dissimilar subject sequences, or distant genomes. Conversely, the refinements provided by MSATs that perform alignments globally that deal with dissimilar sequences, have a better performance in establishing identity among sequences of similar length, limiting the identification of multiple alignment hits for multiple microRNA sequence queries against distant genomes. To address this, we developed ExtendAlign, a computational tool that corrects any unreported m/mm originated from a local multiple alignment. ExtendAlign extends the alignment achieved by a local MSAT and provides an end-to-end report of true m/mm for every hit in each query sequence, reducing the aforementioned alignment bias. We tested the efficacy of ExtendAlign in calculating the number of true m/mm that resulted from a local multiple alignment among mature and precursor microRNAs, and also distant genomes. Our results showed that ExtendAlign increases significantly the number of m/mm originally missed by an MSAT in all alignments tested. Remarkably, ExtendAlign corrects the alignment hits of dissimilar sequences in the range of 35-50% similarity (also known as the 9twilight zone9); suggesting that it could be used also as a regular procedure if high accuracy is required after any local multiple alignment aimed to establish a phylogenetic origin among distant microRNAs, or any other short sequences. ExtendAlign is available for download at https://github.com/Flores-JassoLab.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要