Data Structures to Represent a Set of k-long DNA Sequences

ACM COMPUTING SURVEYS（2021）

引用 37|浏览2

暂无评分

摘要

AbstractThe analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the past 10 years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

查看译文

关键词

k-mer sets, de Bruijn graphs, navigational data structures, Bloom filters, unitgs, FM-index, k-mers, biological sequencing data, data structures

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要