REPrise: de novo interspersed repeat detection using inexact seeding

Atsushi Takeda, Daisuke Nonaka, Yuta Imazu,Tsukasa Fukunaga,Michiaki Hamada

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Motivation: Interspersed repeats occupy a large part of many eukaryotic genomes, and thus their accurate annotation is essential for various genome analyses. Database-free de novo repeat detection approaches are powerful for annotating genomes that lack well-curated repeat databases. However, existing tools do not yet have sufficient repeat detection performance. Results: In this study, we developed REPrise, a de novo interspersed repeat detection software program based on a seed-and-extension method. Although the algorithm of REPrise is similar to that of RepeatScout, which is currently the de facto standard tool, we incorporated three unique techniques into REPrise: inexact seeding, affine gap scoring and loose masking. Analyses of rice and simulation genome datasets showed that REPrise outperformed RepeatScout in terms of sensitivity, especially when the repeat sequences contained many mutations. Furthermore, when applied to the complete human genome dataset T2T-CHM13, REPrise demonstrated the potential to detect novel repeat sequence families. Availability: The source code of REPrise is freely available at https://github.com/hmdlab/REPrise. Repeat annotations predicted for the T2T genome using REPrise are also available at https://waseda.box.com/v/REPrise-data. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要