Optimising high-throughput sequencing data analysis, from gene database selection to the analysis of compositional data: a case study on tropical soil nematodes

Frontiers in Ecology and Evolution(2024)

引用 0|浏览14
暂无评分
摘要
IntroductionHigh-throughput sequencing (HTS) provides an efficient and cost-effective way to generate large amounts of sequence data, providing a very powerful tool to analyze biodiversity of soil organisms. However, marker-based methods and the resulting datasets come with a range of challenges and disputes, including incomplete reference databases, controversial sequence similarity thresholds for delimitating taxa, and downstream compositional data analysis. MethodsHere, we use HTS data from a soil nematode biodiversity experiment to explore standardized HTS data processing procedures. We compared the taxonomic assignment performance of two main rDNA reference databases (SILVA and PR2). We tested whether the same ecological patterns are detected with Amplicon Sequence Variants (ASV; 100% similarity) versus classical Operational Taxonomic Units (OTU; 97% similarity). Further, we tested how different HTS data normalization methods affect the recovery of beta diversity patterns and the identification of differentially abundant taxa.ResultsAt this time, the SILVA 138 eukaryotic database performed better than the PR2 4.12 database, assigning more reads to family level and providing higher phylogenetic resolution. ASV- and OTU-based alpha and beta diversity of nematodes correlated closely, indicating that OTU-based studies represent useful reference points. For downstream data analyses, our results indicate that loss of data during subsampling under rarefaction-based methods might reduce the sensitivity of the method, e.g. underestimate the differences between nematode communities under different treatments, while the clr-transformation-based methods may overestimate effects. The Analysis of Compositions of Microbiome with Bias Correction approach (ANCOM-BC) retains all data and accounts for uneven sampling fractions for each sample, suggesting that this is currently the optimal method to analyze compositional data.DiscussionOverall, our study highlights the importance of comparing and selecting taxonomic reference databases before data analyses, and provides solid evidence for the similarity and comparability between OTU- and ASV-based nematode studies. Further, the results highlight the potential weakness of rarefaction-based and clr-transformation-based methods. We recommend future studies use ASV and that both the taxonomic reference databases and normalization strategies are carefully tested and selected before analyzing the data.
更多
查看译文
关键词
18S rRNA gene,reference database,amplicon sequence variant (ASV),operational taxonomic unit (OTU),data normalization method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要