Pattern discovery for deciphering gene regulation based on evolutionary computation

Pattern discovery for deciphering gene regulation based on evolutionary computation(2010)

引用 23|浏览3
暂无评分
摘要
Transcription Factor (TF) and Transcription Factor Binding Site (TFBS) bindings are fundamental protein-DNA interactions in transcriptional regulation. TFs and TFBSs are conserved to form patterns (motifs) due to their important roles for controlling gene expressions and finally affecting functions and appearances. Pattern discovery is thus important for deciphering gene regulation, which has tremendous impacts on the understanding of life, bio-engineering and therapeutic applications. This thesis contributes to pattern discovery involving TFBS motifs and TF-TFBS associated sequence patterns based on Evolutionary Computation (EC), especially Genetic Algorithms (GAs), which are promising for bioinformatics problems with huge and noisy search space. On TFBS motif discovery, three novel GA based algorithms are developed, namely GALF-P with focus on optimization, GALF-G for modeling, and GASMEN for spaced motifs. Novel memetic operators are introduced, namely local filtering and probabilistic refinement, to significantly improve effectiveness (e.g. 73% better than MEME) and efficiency (e.g. 4.49 times speedup) in search. The GA based algorithms have been extensively tested on comprehensive synthetic, real and benchmark datasets, and shown outstanding performances compared with state-of-the-art approaches. Our algorithms also “evolve” to handle more and more relaxed cases, namely from fixed motif widths to most flexible widths, from single motifs to multiple motifs with overlapping control, from stringent motif instance assumption to very relaxed ones, and from contiguous motifs to generic spaced motifs with arbitrary spacers. TF-TFBS associated sequence pattern (rule) discovery is further investigated for better deciphering protein-DNA interactions in regulation. We for the first time generalize previous exact TF-TFBS rules to approximate ones using a progressive approach. A customized algorithm is developed, outperforming MEME by over 73%. The approximate TF-TFBS rules, compared with the exact ones, have significantly more verified rules and better verification ratios. Detailed analysis on PDB cases and conservation verification on NCBI protein records illustrate that the approximate rules reveal the flexible and specific protein-DNA interactions with much greater generalized capability. The comprehensive pattern discovery algorithms developed will be further verified, improved and extended to further deciphering transcriptionial regulation, such as inferring whole gene regulatory networks by applying TFBS and TF-TFBS patterns discovered and incorporating expression data.
更多
查看译文
关键词
approximate TF-TFBS rule,TF-TFBS pattern,evolutionary computation,comprehensive pattern discovery,previous exact TF-TFBS rule,pattern discovery,deciphering gene regulation,TF-TFBS associated sequence pattern,TFBS motif,TFBS motif discovery,Transcription Factor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要