Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

Kazuo Iwama, Junichi Teruyama, Shuntaro Tsuyama

arXiv: Data Structures and Algorithms(2018)

引用 22|浏览13
暂无评分
摘要
The problem called "String reconstruction from substrings" is a mathematical model of sequencing by hybridization that plays an important role in DNA sequencing. In this problem, we are given a blackbox oracle holding an unknown string 𝒳 and are required to obtain (reconstruct) 𝒳 through "substring queries" Q(S). Q(S) is given to the oracle with a string S and the answer of the oracle is Yes if 𝒳 includes S as a substring and No otherwise. Our goal is to minimize the number of queries for the reconstruction. In this paper, we deal with only binary strings for 𝒳 whose length n is given in advance by using a sequence of good S's. In 1995, Skiena and Sundaram first studied this problem and obtained an algorithm whose query complexity is n+O(log n). Its information theoretic lower bound is n, and they posed an obvious open question; if we can remove the O(log n) additive term. No progress has been made until now. This paper gives two partially positive answers to this open question. One is a randomized algorithm whose query complexity is n+O(1) with high probability and the other is an average-case algorithm also having a query complexity of n+O(1) on average. The n lower bound is still true for both cases, and hence they are optimal up to an additive constant.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要