Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
- Peter Hansen1,
- Jochen Hecht1,
- Daniel Ibrahim2,
- Alexander Krannich1,
- Matthias Truss1 and
- Peter N Robinson3,4
- 1 Charité Universitätsmedizin Berlin;
- 2 Max Planck Institute for Molecular Genetics;
- 3 Charité University Hospital
- ↵* Corresponding author; email: peter.robinson{at}charite.de
Abstract
Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein-DNA interactions based on various measures of enrichment of sequence reads. In this work we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5' ends of reads. We show that our method is not only substantially faster than several competing methods, but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We showed that Q had superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C++ and is freely available under an open source license.
- Received January 21, 2015.
- Accepted July 6, 2015.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











