Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps
- Ali Mortazavi1,2,12,13,
- Shirley Pepke3,4,12,
- Camden Jansen1,2,
- Georgi K. Marinov4,
- Jason Ernst5,
- Manolis Kellis6,7,
- Ross C. Hardison8,9,
- Richard M. Myers10 and
- Barbara J. Wold4,11,13
- 1Department of Developmental and Cell Biology, University of California, Irvine, California 92697, USA;
- 2Center for Complex Biological Systems, University of California, Irvine, California 92697, USA;
- 3Center for Advanced Computing Research, California Institute of Technology, Pasadena, California 91125, USA;
- 4Division of Biology, California Institute of Technology, Pasadena, California 91125, USA;
- 5Department of Biological Chemistry, University of California, Los Angeles, California 90095, USA;
- 6MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA;
- 7Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
- 8Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- 9Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- 10HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
- 11Beckman Institute, California Institute of Technology, Pasadena, California 91125, USA
-
↵12 These authors contributed equally to this work.
Abstract
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.
Footnotes
-
↵13 Corresponding authors
E-mail woldb{at}caltech.edu
E-mail ali.mortazavi{at}uci.edu
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.158261.113.
Freely available online through the Genome Research Open Access option.
- Received March 29, 2013.
- Accepted October 7, 2013.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.











