Taxor: Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters

bioRxiv (Cold Spring Harbor Laboratory)（2023）

引用 0|浏览5

暂无评分

摘要

Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Due to the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memoryefficient querying of long reads. Here we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches such as syncmers for pseudoalignment to classify reads and an Expectation-Maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms competing shortand long-read tools regarding precision, while having a similar recall. Most notably, Taxor reduces the memory requirements and index size by more than 50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field. Taxor is available at . ### Competing Interest Statement The authors have declared no competing interest.

查看译文

关键词

long reads,classification,space-efficient

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要