A fast and adaptive detection framework for genome-wide chromatin loop mapping from Hi-C data

  1. Yu Li2,5
  1. 1 King Abdullah University of Science and Technology;
  2. 2 The Chinese University of Hong Kong;
  3. 3 Korea Advanced Institute of Science and Technology;
  4. 4 NorthEast Forestry University
  • * Corresponding author; email: liyu{at}cse.cuhk.edu.hk
  • Abstract

    Chromatin loop identification plays an important role in molecular biology and 3D genomics research, as it constitutes a fundamental process in transcription and gene regulation. Such precise chromatin structures can be identified across genome-wide interaction matrices via Hi-C data analysis, which is essential for unraveling the intricacies of transcriptional regulation. Given the increasing number of genome-wide contact maps, derived from both in situ Hi-C and single-cell Hi-C experiments, there is a pressing need for efficient and resilient algorithms capable of processing data from diverse experiments rapidly and adaptively. Here, we propose YOLOOP, a novel detection-based framework that is different from the conventional paradigm. YOLOOP stands out for its speed, surpassing the performance of previous state-of-the-art (SOTA) chromatin loop detection methods. It achieves a 30-fold acceleration compared to classification-based methods, up to 20-fold acceleration compared to the SOTA kernel-based framework, and a 5-fold acceleration compared to statistical algorithms. Furthermore, our proposed framework exhibits exceptional generalization capabilities across various cell types, multi-resolution Hi-C maps, and diverse experimental protocols. Compared with the existing paradigms, YOLOOP shows up to a 10% increase in recall and a 15% increase in F1-score, particularly noteworthy in the GM12878 cell line. YOLOOP also offers fast adaptability with straightforward fine-tuning, making it readily applicable to extremely sparse single-cell Hi-C contact maps. It maintains its exceptional speed, completing genome-wide detection at a 10 kb resolution for one single-cell contact map within 1 minute, and for 900-cells-superimposed contact map within 3 minutes, enabling fast analysis on massive amounts of single-cell data.

    • Received March 5, 2024.
    • Accepted August 8, 2024.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International license), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    Preprint Server