Pairwise sequence alignment with block and character edit operations.
CoRR(2023)
摘要
Pairwise sequence comparison is one of the most fundamental problems in
string processing. The most common metric to quantify the similarity between
sequences S and T is edit distance, d(S,T), which corresponds to the number of
characters that need to be substituted, deleted from, or inserted into S to
generate T. However, fewer edit operations may be sufficient for some string
pairs to transform one string to the other if larger rearrangements are
permitted. Block edit distance refers to such changes in substring level (i.e.,
blocks) that "penalizes" entire block removals, insertions, copies, and
reversals with the same cost as single-character edits (Lopresti & Tomkins,
1997). Most studies to calculate block edit distance to date aimed only to
characterize the distance itself for applications in sequence nearest neighbor
search without reporting the full alignment details. Although a few tools try
to solve block edit distance for genomic sequences, such as GR-Aligner, they
have limited functionality and are no longer maintained.
Here, we present SABER, an algorithm to solve block edit distance that
supports block deletions, block moves, and block reversals in addition to the
classical single-character edit operations. Our algorithm runs in
O(m^2.n.l_range) time for |S|=m, |T|=n and the permitted block size range of
l_range; and can report all breakpoints for the block operations. We also
provide an implementation of SABER currently optimized for genomic sequences
(i.e., generated by the DNA alphabet), although the algorithm can theoretically
be used for any alphabet.
SABER is available at http://github.com/BilkentCompGen/saber
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要