GHT-SELEX Demonstrates Unexpectedly High Intrinsic Sequence Specificity and Complex DNA Binding of Many Human Transcription Factors

Arttu Jolma,Aldo Hernandez-Corchado,Ally W H Yang, Ali Fathi,Kaitlin U Laverty,Alexander Brechalov,Rozita Razavi,Mihai Albu,Hong Zheng, Codebook Consortium,Ivan V Kulakovskiy,Hamed S Najafabadi,Timothy R Hughes

bioRxiv the preprint server for biology（2024）

Donnelly Centre

Cited 0|Views4

Abstract

A long-standing challenge in human regulatory genomics is that transcription factor (TF) DNA-binding motifs are short and degenerate, while the genome is large. Motif scans therefore produce many false-positive binding site predictions. By surveying 179 TFs across 25 families using >1,500 cyclic in vitro selection experiments with fragmented, naked, and unmodified genomic DNA - a method we term GHT-SELEX (Genomic HT-SELEX) - we find that many human TFs possess much higher sequence specificity than anticipated. Moreover, genomic binding regions from GHT-SELEX are often surprisingly similar to those obtained in vivo (i.e. ChIP-seq peaks). We find that comparable specificity can also be obtained from motif scans, but performance is highly dependent on derivation and use of the motifs, including accounting for multiple local matches in the scans. We also observe alternative engagement of multiple DNA-binding domains within the same protein: long C2H2 zinc finger proteins often utilize modular DNA recognition, engaging different subsets of their DNA binding domain (DBD) arrays to recognize multiple types of distinct target sites, frequently evolving via internal duplication and divergence of one or more DBDs. Thus, contrary to conventional wisdom, it is common for TFs to possess sufficient intrinsic specificity to independently delineate cellular targets.

Translated text

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：研究通过GHT-SELEX方法发现许多人类转录因子具有比预期更高的序列特异性和复杂的DNA结合特性，揭示了转录因子在细胞目标识别中的内在特异性。

【方法】：采用GHT-SELEX（Genomic HT-SELEX）方法，对179个转录因子进行超过1,500次的循环体外选择实验，使用未修饰的基因组DNA片段。

【实验】：实验使用了GHT-SELEX方法，并在实验中使用了未修饰的基因组DNA片段。结果显示，GHT-SELEX得到的基因组结合区域与体内实验（如ChIP-seq峰）的结果高度相似。