Mining NCBI Sequence Read Archive Database: An Untapped Source of Organelle Genomes for Taxonomic and Comparative Genomics Research

Vahap Eldem, Mehmet Ali Balcı

Diversity（2024）

引用 0|浏览2

暂无评分

摘要

The NCBI SRA database is constantly expanding due to the large amount of genomic and transcriptomic data from various organisms generated by next-generation sequencing, and re-searchers worldwide regularly deposit new data into the database. This high-coverage genomic and transcriptomic information can be re-evaluated regardless of the original research subject. The database-deposited NGS data can offer valuable insights into the genomes of organelles, particularly for non-model organisms. Here, we developed an automated bioinformatics workflow called “OrgaMiner”, designed to unveil high-quality mitochondrial and chloroplast genomes by data mining the NCBI SRA database. OrgaMiner, a Python-based pipeline, automatically orchestrates various tools to extract, assemble, and annotate organelle genomes for non-model organisms without available organelle genome sequences but with data in the NCBI SRA. To test the usability and feasibility of the pipeline, “mollusca” was selected as a keyword, and 76 new mitochondrial genomes were de novo assembled and annotated automatically without writing one single code. The applicability of the pipeline can be expanded to identify organelles in diverse invertebrate, vertebrate, and plant species by simply specifying the taxonomic name. OrgaMiner provides an easy-to-use, end-to-end solution for biologists mainly working with taxonomy and population genetics.

查看译文

关键词

next-generation sequencing,sequence read archive,organelle genome,systematics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要