Sequence features that drive human promoter function and tissue specificity
- Jane M. Landolin1,5,
- David S. Johnson2,5,6,
- Nathan D. Trinklein3,
- Shelly F. Aldred3,
- Catherine Medina2,6,
- Hennady Shulha4,
- Zhiping Weng4,8 and
- Richard M. Myers2,3,7,8
- 1 Division of Life Sciences, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
- 2 Department of Genetics, Stanford University, Stanford, California 94305-5120, USA;
- 3 SwitchGear Genomics, Menlo Park, California 94025, USA;
- 4 Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts, Worcester, Massachusetts 01655, USA
-
↵5 These authors contributed equally to this work.
Abstract
Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line–specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line–specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.
Footnotes
-
↵8 Corresponding authors.
E-mail Zhiping.Weng{at}umassmed.edu; fax (508) 856-2392.
E-mail rmyers{at}hudsonalpha.org; fax (256) 327-0978.
-
[Supplemental material is available online at http://www.genome.org. The gene expression data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE21045.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.100370.109.
- Received September 5, 2009.
- Accepted April 12, 2010.
- Copyright © 2010 by Cold Spring Harbor Laboratory Press











