PRISM offers a comprehensive genomic approach to transcription factor function prediction
- Aaron M Wenger,
- Shoa L Clarke,
- Harendra Guturu,
- Jenny Chen,
- Bruce T Schaar,
- Cory Y McLean and
- Gill Bejerano1
- ↵* Corresponding author; email: bejerano{at}stanford.edu
Abstract
The human genome encodes 1,500-2,000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high quality non-redundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (Predicting Regulatory Information from Single Motifs) approach obtains 2,543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
- Received February 12, 2012.
- Accepted January 25, 2013.
- © 2013, Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.











