Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals

J Engreitz,M Beer

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览3
暂无评分
摘要
Abstract Gene regulatory elements drive many complex biological phenomena such as fetal development, and their mutations are linked to a multitude of common human diseases. The phenotypic impacts of regulatory variants are often tested using their conserved orthologous counterparts in model organisms such as mice. However, mapping human enhancers to conserved elements in mice remains a challenge, due to both rapid evolution of enhancers and limitations of current computational methods to detect conserved regulatory sequences. To improve upon existing computational methods and to better understand the sources of this apparent regulatory divergence, we comprehensively measured the evolutionary dynamics of distal enhancers across 45 matched human/mouse cell/tissue pairs from more than 1,000 DNase-seq experiments. Using this expansive dataset, we show that while cell-specific regulatory vocabulary is conserved, enhancers evolve more rapidly than other genomic elements such as promoters and CTCF binding sites. We observed surprisingly high levels of cell-specific variability in enhancer conservation rates, in part explainable by tissue specific transposable element activity. To improve orthologous enhancer mapping, we developed an improved genome alignment algorithm using gapped-kmer sequence features, and using the matched cell/tissue pairs, we show that this novel computational method, gkm-align , discovers 23,660 novel human/mouse conserved enhancers missed by standard alignment algorithms.
更多
查看译文
关键词
distal enhancers,regulatory vocabularies,distant mammals,gapped-kmer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要