Optimizing the Accuracy of Randomized Embedding for Sequence Alignment

Yiqing Yan, Nimisha Chaturvedi,Raja Appuswamy

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2022)

引用 0|浏览8
暂无评分
摘要
Gapped alignment of sequenced data to a reference genome has traditionally been a computationally-intensive task due to the use of edit distance for dealing with indels and mismatches introduced by sequencing. In prior work, we developed Accel-Align [1], a Seed-Embed-Extend (SEE) sequence aligner that uses randomized embedding algorithms to quickly identify optimal candidate locations using Hamming distance rather than edit distance. While Accel-Align provides up to an order of magnitude improvement over state-of-the-art aligners, the randomized nature of embedding can lead to alignment errors resulting in lower precision and recall with downstream variant callers. In this work, we propose several techniques for improving the accuracy of randomized embedding-based sequence alignment. We provide an efficient implementation of these techniques in Accel-Align, and use it to present a comparative evaluation that demonstrates that the accuracy improvements can be achieved without sacrificing performance. Code is accessible in github.com/raja-appuswamy/accel-ali2n-release.
更多
查看译文
关键词
alignment,mapping,embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要