Massively Parallel Approximation Algorithms for Edit Distance and Longest Common Subsequence.

SODA '19: Symposium on Discrete Algorithms San Diego California January, 2019(2019)

引用 35|浏览53
暂无评分
摘要
String similarity measures are among the most fundamental problems in computer science. The notable examples are edit distance (ED) and longest common subsequence (LCS). These problems find their applications in various contexts such as computational biology, text processing, compiler optimization, data analysis, image analysis, etc. In this work, we revisit edit distance and longest common subsequence in the parallel settings. We present massively parallel algorithms for both problems that are optimal in the following senses: • The approximation factor of our algorithms is 1 + ϵ. • The round complexity of our algorithms is constant. • The total running time of our algorithms over all machines is Õ(n2). This matches the running time of the best-known solutions for approximating edit distance and longest common subsequence within a 1+ϵ factor in the sequential setting. Our result for edit distance substantially improves the massively parallel algorithm of [15] in terms of approximation factor, round complexity, number of machines, and total running time. Our unified approach to tackle both problems is to divide one of the strings into smaller blocks and try to locally predict which intervals of the other string correspond to each block in an optimal solution. Our main technical contribution is a novel parallel algorithm for computing a set of compositions, and recursively decomposing each function into a set of smaller iterative compositions (in terms of memory needed to solve the problem). These two methods together give us a strong tool for approximating combinatorial problems. For instance, LCS can be formulated as a recursive composition of functions and therefore this tool enables us to approximate LCS within a factor 1 + ϵ. Indeed, we recursively decompose the problem until we are able to compute the solution on a single machine. Since our methods are quite general, we expect this technique to find its applications in other combinatorial problems as well.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要