CCSeq: Clusters of Colocalized Sequences

biorxiv(2019)

引用 0|浏览1
暂无评分
摘要
Motivation Potential transcription factor (TF) complexes may be identified by testing whether the binding sequences of individual TF proteins form clusters with each other. These clusters may also indicate TF inhibition due to competitive occupancy of enhancer regions. Genome annotation data containing the coordinates of enhancer sequences is highly accessible via position-weight matrix tools. Results An algorithm called CCSeq (Clusters of Colocalized Sequences) was developed for identifying clusters of sequences along a one-dimensional line, such as a chromosome, given genome annotation files and a cut-off distance as inputs. The algorithm was applied to the binding sequences of the constituent proteins of two known transcription factor complexes, the HSF1 homotrimer and one form of the NF- κ B complex, a dimer of NFKB2 and RELB. 28 clusters of HSF1 trimer binding sequences were identified on chromosome Y, and 16 clusters of the NFKB2 and RELB dimer were identified on chromosome 17, compared to 0 clusters identified in any of the five simulated random distributions for each of the two sets of TF proteins. Additionally, structural patterns of these binding sequence clusters are described. Availability and Implementation This algorithm is freely available as an R package on the open source R repository CRAN at the following link: . Genome annotation files were obtained from the PWMScan tool at hosted by the Swiss Insitute of Bioinformatics ([2][1]) ([3][2]). [1]: #ref-2 [2]: #ref-3
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要