A widespread role of the motif environment on transcription factor binding across diverse protein families
- 1 Technion-Israel Institute of Technology;
- 2 Tel Aviv University;
- 3 University of Southern California
- ↵* Corresponding author; email: yaelmg{at}tx.technion.ac.il
Abstract
Transcriptional regulation requires the binding of transcription factors (TFs) to short sequence-specific DNA motifs, usually located at the gene regulatory regions. Interestingly, based on a vast amount of data accumulated from genomic assays it has been shown that only a small fraction of all potential binding sites containing the consensus motif of a given TF actually bind the protein. Recent in vitro binding assays, which exclude the effects of the cellular environment, also demonstrate selective TF binding. An intriguing conjecture is that the surroundings of cognate binding sites have unique characteristics, which distinguish them from other sequences containing a similar motif that are not bound by the TF. To test this hypothesis we conducted a comprehensive analysis of the sequence and DNA shape features surrounding the core binding sites of 239 and 56 TFs extracted from in vitro HT-SELEX binding assays and in vivo ChIP-seq data, respectively. Comparing the nucleotide content of the regions around the TF-bound sites to the counterpart unbound regions containing the same consensus motifs revealed significant differences, which extend far beyond the core binding site. Specifically, the environment of the bound motifs demonstrated unique sequence compositions, DNA shape features, and overall high similarity to the core-binding motif. Notably, the regions around the binding sites of TFs that belong to the same TF families exhibited similar features, with high agreement between the in vitro and in vivo datasets. We propose that these unique features assist in guiding TFs to their cognate binding sites.
- Received September 21, 2014.
- Accepted July 8, 2015.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.











