Crunch: Integrated processing and modeling of ChIP-seq data in terms of regulatory motifs

  1. Erik van Nimwegen1,3
  1. 1 University of Basel;
  2. 2 Wellcome Centre for Human Genetics, University of Oxford
  • * Corresponding author; email: erik.vannimwegen{at}unibas.ch
  • Abstract

    Although ChIP-seq has become a routine experimental approach for quantitatively characterizing the genome-wide binding of transcription factors (TFs), computational analysis procedures remain far from standardized, making it difficult to compare ChIP-seq results across experiments. In addition, while genome-wide binding patterns must ultimately be determined by local constellations of DNA-binding sites, current analysis is typically limited to identifying enriched motifs in ChIP-seq peaks. Here we present Crunch, a completely automated computational method that performs all ChIP-seq analysis from quality control through read mapping and peak detecting, and integrates comprehensive modeling of the ChIP signal in terms of known and novel binding motifs, quantifying the contribution of each motif, and annotating which combinations of motifs explain each binding peak. Applying Crunch to 128 datasets from the ENCODE Project we show that Crunch outperforms current peak finders and find that TFs naturally separate into `solitary TFs', for which a single motif explains the ChIP-peaks, and `cobinding TFs' for which multiple motifs co-occur within peaks. Moreover, for most datasets the motifs that Crunch identified de novo outperform known motifs and both the set of cobinding motifs and the top motif of solitary TFs are consistent across experiments and cell lines. Crunch is implemented as a web server, enabling standardized analysis of any collection of ChIP-seq datasets by simply uploading raw sequencing data. Results are provided both in a graphical web interface and as downloadable files.

    • Received May 7, 2018.
    • Accepted May 14, 2019.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International license), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    Preprint Server