A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed beta-diversity measures

PLoS computational biology(2023)

引用 0|浏览6
暂无评分
摘要
k-mer-based distances are often used to describe the differences between communities in metagenome sequencing studies because of their computational convenience and history of effectiveness. Although k-mer-based distances do not use information about taxon abundances, we show that one class of k-mer distances between metagenomes (the Euclidean distance between k-mer spectra, or EKS distances) are very closely related to a class of phylogenetically-informed beta-diversity measures that do explicitly use both the taxon abundances and information about the phylogenetic relationships among the taxa. Furthermore, we show that both of these distances can be interpreted as using certain features of the taxon abundances that are related to the phylogenetic tree. Our results allow practitioners to perform phylogenetically-informed analyses when they only have k-mer data available and provide a theoretical basis for using k-mer spectra with relatively small values of k (on the order of 4-5). They are also useful for analysts who wish to know more of the properties of any method based on k-mer spectra and provide insight into one class of phylogenetically-informed beta-diversity measures. Author summaryMicrobiologists have two major strategies for understanding the bacterial communities present in the environment: shotgun metagenome sequencing and amplicon sequencing. Both involve taking samples from the environment, extracting DNA from those samples, and sequencing the extracted DNA. They have different strengths and give different kinds of information about the communities. Because they give different kinds of information, methods for analyzing microbiome data tend to be developed for and used on just one kind of study. In this paper, we show a strong relationship between a set of methods for measuring distances between samples in shotgun metagenome sequencing datasets (the k-mer-based distances) and a set of methods for measuring distances between samples in amplicon sequencing datasets (the phylogenetically-informed beta diversity measures). This is a convenient correspondence because k-mer spectra are easier to extract from shotgun metagenome sequencing datasets than the taxon abundances that would be needed to compute the phylogenetically-informed beta diversities. Therefore, if an analyst would like to compute phylogenetically-informed distances between communities from a shotgun metagenome sequencing dataset, our results show that they can work directly with the k-mer spectra and not worry about estimating taxon abundances. The results also imply that any of the many methods that are based on k-mer spectra are implicitly using phylogenetic information.
更多
查看译文
关键词
metagenomic distances,k-mer-based,phylogenetically-informed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要