Weighted Edit Distance Computation: Strings, Trees, and Dyck

PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023(2023)

引用 2|浏览57
暂无评分
摘要
Given two strings of length n over alphabet Sigma, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica86) and Landau and Vishkin (JCSS88) from almost forty years back computes the unweighted string edit distance in O(n+k(2)) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar (O) over tilde (n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations (insertions, deletions, and substitutions), and the weights may vary with the characters being edited. Given a weight function w : Sigma boolean OR{epsilon} x Sigma boolean OR{epsilon} -> R->= 0 (such that w(a,a) = 0 and w(a,b) >= 1 for all a, b is an element of Sigma boolean OR{epsilon} with a not equal b ), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla O(n(2))-time dynamic-programming algorithm and its almost trivial O(nk)-time implementation (k being an upper bound on the sought total weight), none of the aforementioned developments on the unweighted edit distance applies to the weighted variant. In this paper, we propose the first O(n+poly(k))-time algorithm that computes the weighted string edit distance exactly, thus bridging a fundamental decades-old gap between our understanding of unweighted and weighted edit distance. We then generalize this result to the weighted tree and Dyck edit distances, bringing in several new techniques, which lead to a deterministic algorithm that improves upon the previous work even for unweighted tree edit distance. Given how fundamental weighted edit distance is, we believe our O(n+poly(k))-time algorithm will be instrumental for further significant developments in the area.
更多
查看译文
关键词
edit distance,string similarity,kernelization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要