Semi-automated curation and manual addition of sequences to build reliable and extensive reference databases for ITS2 vascular plant DNA (meta-)barcoding

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览7
暂无评分
摘要
Abstract With the breakthrough of DNA (meta)-barcoding, it soon became clear that one of the most critical steps for accurate taxonomic identification is to have an accurate DNA reference database for the DNA barcode marker of choice. Therefore, developing such a database has been a long-term ambition, especially in the Viridiplantae kingdom. Typically, reference databases are constructed from marker sequences downloaded from general public databases, which can carry taxonomic and other relevant errors. Herein, we constructed a curated (i) global database, (ii) European crop database, and (iii) 27 databases for the EU countries for the ITS2 barcoding marker of vascular plants. To that end, we first developed a pipeline script that entails (i) an automated curation stage comprising five filters, (ii) manual taxonomic correction for misclassified taxa, and (iii) manual addition of newly sequenced species. The pipeline allows easy updating of the curated global database. With this approach, 13% of the sequences, corresponding to 7% of species, originally imported from GenBank were discarded. Further, 259 sequences were newly added to the curated global database, which now comprises 307,977 sequences of 111,382 plant species.
更多
查看译文
关键词
vascular plant dna,semi-automated
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要