Decoding ChIP-Seq peaks with a double-binding signal refines binding peaks to single-nucleotide and predicts cooperative interaction

  1. James Galagan1
  1. 1 Boston University;
  2. 2 Broad Institute of MIT and Harvard
  1. * Corresponding author; email: antluiz{at}bu.edu

Abstract

The comprehension of protein and DNA binding in vivo is essential to understand gene regulation. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) provides a global map of the regulatory binding network. Most ChIP-seq analysis tools focus on identifying binding regions from coverage enrichment. However, less work has been performed to infer the physical and regulatory details inside the enriched regions. This research extends a previous blind-deconvolution approach to develop a post peak caller algorithm that improves binding site resolution and predicts cooperative interactions. At the core of our new method is a physically motivated model that characterizes the binding signal as an extreme value distribution. This model suggests a mathematical framework to study physical properties of DNA shearing from the ChIP-seq coverage. The model explains the ChIP-seq coverage with two signals: the first considers DNA fragments with only a single binding event, while the second considers fragments with two binding events (a double-binding signal). The model incorporates motif discovery and is able to detect multiple sites in an enriched region with single-nucleotide resolution, high sensitivity, and high specificity. Our method improves peak caller sensitivity, from less than 45% to up to 94%, at a false positive rate less than 11% for a set of 47 experimentally validated prokaryotic sites. It also improves resolution of highly enriched regions of large scale Eukaryotic datasets. The double-binding signal provides a novel application in ChIP-seq analysis: the identification of cooperative interaction. Predictions of known cooperative binding sites show a 0.85 area under an ROC curve.

  • Received June 6, 2013.
  • Accepted July 8, 2014.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

ACCEPTED MANUSCRIPT

Preprint Server