Co-linear Chaining with Overlaps and Gap Costs

Annual International Conference on Research in Computational Molecular Biology (RECOMB)(2022)

引用 8|浏览14
暂无评分
摘要
Motivation Co-linear chaining has proven to be a powerful technique for finding approximately optimal alignments and approximating edit distance. It is used as an intermediate step in numerous mapping tools that follow seed-and-extend strategy. Despite this popularity, subquadratic time algorithms for the case where chains support anchor overlaps and gap costs are not currently known. Moreover, a theoretical connection between co-linear chaining cost and edit distance remains unknown. Results We present algorithms to solve the co-linear chaining problem with anchor overlaps and gap costs in Õ ( n ) time, where n denotes the count of anchors. We establish the first theoretical connection between co-linear chaining cost and edit distance. Specifically, we prove that for a fixed set of anchors under a carefully designed chaining cost function, the optimal ‘anchored’ edit distance equals the optimal co-linear chaining cost. Finally, we demonstrate experimentally that optimal co-linear chaining cost under the proposed cost function can be computed significantly faster than edit distance, and achieves high correlation with edit distance for closely as well as distantly related sequences. Implementation Contact chirag{at}iisc.ac.in, daniel.j.gibney{at}gmail.com, sharma.thankachan{at}ucf.edu ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
overlaps,costs,gap,co-linear
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要