Informative Positional Priors Improve De Novo Motif Discovery

msra

引用 22|浏览6
暂无评分
摘要
Identification of binding sites of transcription factors (TFs) on a genome-wide scale is an important part of understanding transcriptional regulatory processes in the cell. A common approach is to search for a motif that is statistically overrepresented in a set of promoters of co-regulated genes. However, TF binding sites are short and often degenerate, posing a significant statistical challenge to de novo motif discovery. Furthermore, without any additional information, a binding site is considered equally likely to occur at any position in the promoter. To enhance the signal of true motifs from background noise, we propose the use of informative positional priors derived from three kinds of data: 1. Most large scale high throughput experimental methods like ChIP-chip, DIP-chip, and PBM give rise to two sets of DNA sequences: bound and not bound by the profiled transcription factor. We compute a discriminative positional prior using both sets. 2. It has been shown that active regulatory regions are usually deprived of nucleosomes, thereby enabling TFs to bind DNA in those regions (2). We compute a positional prior based on predictions of nucleosome occupancy from a recently published computational model (4). 3. It has been shown that there exists a predictive relationship between DNA binding sites and TF structural class (e.g., leucine zipper, forkhead)(3) which we leverage to build class-specific positional priors. We incorporate each prior into a Gibbs sampler to discover motifs in the yeast ChIP- chip data of (1). Our three priors exhibit an improvement of more than 11%, 27%, and 14% respectively when compared with any of three state-of-the-art methods AlignACE, MEME and MDscan. Moreover, all our priors perform better than the three conservation based methods used by (1) in their analysis. We note that the nucleosome occupancy based prior performs exceptionally better than a non-informative uniform prior (finding 50% more motifs correctly). Interestingly, we get this performance only when the prior is calculated in a discriminative setting, i.e., each word in a promoter is scored according to how often it is free of nucleosomes in the bound versus the unbound promoters. However, when the prior is used directly on the bound promoters, the improvement is only marginal.
更多
查看译文
关键词
nucleosome occupancy,discriminative motif finding,transcription factor binding site
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要