Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to BRAKER, Galba, and TSEBRA
arxiv(2024)
摘要
Annotating the structure of protein-coding genes represents a major challenge
in the analysis of eukaryotic genomes. This task sets the groundwork for
subsequent genomic studies aimed at understanding the functions of individual
genes. BRAKER and Galba are two fully automated and containerized pipelines
designed to perform accurate genome annotation. BRAKER integrates the
GeneMark-ETP and AUGUSTUS gene finders, employing the TSEBRA combiner to attain
high sensitivity and precision. BRAKER is adept at handling genomes of any
size, provided that it has access to both transcript expression sequencing data
and an extensive protein database from the target clade. In particular, BRAKER
demonstrates high accuracy even with only one type of these extrinsic evidence
sources, although it should be noted that accuracy diminishes for larger
genomes under such conditions. In contrast, Galba adopts a distinct methodology
utilizing the outcomes of direct protein-to-genome spliced alignments using
miniprot to generate training genes and evidence for gene prediction in
AUGUSTUS. Galba has superior accuracy in large genomes if protein sequences are
the only source of evidence. This chapter provides practical guidelines for
employing both pipelines in the annotation of eukaryotic genomes, with a focus
on insect genomes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要