Faster Computation of Genome Mappability with one Mismatch

Sahar Hooshmand,Paniz Abedin,Daniel Gibney,Srinivas Aluru,Sharma V. Thankachan

2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)（2018）

引用 2|浏览13

暂无评分

摘要

Summary form only given. The genome mappability problem refers to cataloging repetitive occurrences of every substring of length m in a genome, and its k-mappability variant extends this to approximate repeats by allowing up to k mismatches. This problem is formulated as follows: Given a sequence S[1, n] of length n over the constant DNA alphabet Σ = {A, C, G, T}, and two integers k and m ≤ n, output an integer array F _k , such that: F _k [i] = |{j ≠ i|d _H (S[i, i + m - 1], S[j, j + m - 1]) ≤ k}| where d _H (·,·) represents the hamming distance. Derrien et al. [PLoS one 2012] represented this problem within the framework of genome analysis. In this work we present a provably efficient algorithm for 1-mappability with O(n log n) worst case run time and O(n) spece. The fundamental technique is the heavy path decomposition on the suffix tree (ST) of S, and the entire work is based on the framework by Thankachan et al. [RECOMB 2018]. The previous best known run time is O(n log n log log n) [Alzamel et al., COCOA 2017].

查看译文

关键词

Genome mappability,heavy path decomposition,Hamming distance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要