KTU: K-mer Taxonomic Units improve the biological relevance of amplicon sequence variant microbiota data

METHODS IN ECOLOGY AND EVOLUTION(2022)

引用 3|浏览9
暂无评分
摘要
Amplicon sequencing is widely implemented in microbiome-associated studies. In recent years, microbial ecologists have switched to new algorithms for taxonomic identification and quantification. The amplicon sequence variant (ASV) denoising algorithm of unbiased sequence picking has replaced the OTU clustering methods. ASV can be used to detect and distinguish biological variations to the species OTU level (>= 97% similarity). However, the ASV quantification among samples is sparse and less prevalent within the same batch. Here, we present a k-mer based, alignment-free algorithm-'KTU' (K-mer Taxonomic Unit)-to iteratively re-cluster ASVs into optimal biological taxonomic units. The 'KTU' algorithm comprises four parts: (a) The k-mer frequency calling is sliding window counted by tetranucleotide frequencies from both ends of the DNA sequence. (b) The similarities in k-mer frequencies among the sequences are measured by cosine dissimilarity. (c) The KTUs are detected from the cosine dissimilarity matrix using the partition around medoids (PAM) clustering algorithm. The iterative PAM-KTU detecting process searches for the numbers of KTU convergent clusters according to the maximum silhouette coefficient. (d) Finally, the ASVs are aggregated into the corresponding KTUs. KTU re-clustered every 1.38-4.53 ASVs into a feature with >99% sequence similarity on average and 1% cosine divergence for each KTU. Additionally, the re-clustering procedure improved biological explanations for correlations and significances of clinical and environmental factors.
更多
查看译文
关键词
amplicon sequencing,k-mer-based taxonomy unit,microbiome-associated studies,microbiota
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要