HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient

    • 1Bioinformatics and Genomics Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
    • 2Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
    • 3Department of Genome Sciences, University of Washington, Seattle, Washington 98105, USA;
    • 4Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
    • 5Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98105, USA;
    • 6Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
Published August 30, 2017. Vol 27 Issue 11, pp. 1939-1949. https://doi.org/10.1101/gr.220640.117
Download PDF Please log-in to or register for your personal account in order to access PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 5
Current Issue:

Abstract

Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.

Loading
Loading
Back to top