Modeling cis-regulation with a compendium of genome-wide histone H3K27ac profiles
- Su Wang1,2,8,
- Chongzhi Zang3,4,8,
- Tengfei Xiao3,4,5,
- Jingyu Fan2,
- Shenglin Mei2,
- Qian Qin2,
- Qiu Wu2,
- Xujuan Li2,
- Kexin Xu6,
- Housheng Hansen He7,
- Myles Brown4,5,
- Clifford A. Meyer3,4 and
- X. Shirley Liu3,4
- 1Shanghai Key Laboratory of Tuberculosis, Shanghai Pulmonary Hospital, Shanghai, 200433, China;
- 2Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, 200092, China;
- 3Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA;
- 4Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- 5Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02215, USA;
- 6Department of Molecular Medicine/Institute of Biotechnology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3900, USA;
- 7Department of Medical Biophysics, University of Toronto, Toronto, Ontario, M5G 1L7, Canada
- Corresponding authors: cliff{at}jimmy.harvard.edu, xsliu{at}jimmy.harvard.edu
-
↵8 These authors contributed equally to this work.
Abstract
Model-based analysis of regulation of gene expression (MARGE) is a framework for interpreting the relationship between the H3K27ac chromatin environment and differentially expressed gene sets. The framework has three main functions: MARGE-potential, MARGE-express, and MARGE-cistrome. MARGE-potential defines a regulatory potential (RP) for each gene as the sum of H3K27ac ChIP-seq signals weighted by a function of genomic distance from the transcription start site. The MARGE framework includes a compendium of RPs derived from 365 human and 267 mouse H3K27ac ChIP-seq data sets. Relative RPs, scaled using this compendium, are superior to superenhancers in predicting BET (bromodomain and extraterminal domain) -inhibitor repressed genes. MARGE-express, which uses logistic regression to retrieve relevant H3K27ac profiles from the compendium to accurately model a query set of differentially expressed genes, was tested on 671 diverse gene sets from MSigDB. MARGE-cistrome adopts a novel semisupervised learning approach to identify cis-regulatory elements regulating a gene set. MARGE-cistrome exploits information from H3K27ac signal at DNase I hypersensitive sites identified from published human and mouse DNase-seq data. We tested the framework on newly generated RNA-seq and H3K27ac ChIP-seq profiles upon siRNA silencing of multiple transcriptional and epigenetic regulators in a prostate cancer cell line, LNCaP-abl. MARGE-cistrome can predict the binding sites of silenced transcription factors without matched H3K27ac ChIP-seq data. Even when the matching H3K27ac ChIP-seq profiles are available, MARGE leverages public H3K27ac profiles to enhance these data. This study demonstrates the advantage of integrating a large compendium of historical epigenetic data for genomic studies of transcriptional regulation.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.201574.115.
- Received November 4, 2015.
- Accepted July 21, 2016.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











