Identifying clusters of cis-regulatory elements underpinning TAD structures and lineage-specific regulatory networks

  1. Mathieu Lupien1,2,4
  1. 1Princess Margaret Cancer Centre, Toronto, Ontario M5G 1L7, Canada;
  2. 2Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada;
  3. 3Department of Computer Science, University of Toronto, Toronto, Ontario M5T 3A1, Canada;
  4. 4Ontario Institute for Cancer Research, Toronto, Ontario M5G 1L7, Canada
  • Corresponding authors: ali.madanitonekaboni{at}mail.utoronto.ca, bhaibeka{at}uhnresearch.ca, mlupien{at}uhnres.utoronto.ca
  • Abstract

    Cellular identity relies on cell-type–specific gene expression controlled at the transcriptional level by cis-regulatory elements (CREs). CREs are unevenly distributed across the genome, giving rise to individual CREs and clusters of CREs (COREs). Technical and biological features hinder CORE identification. We addressed these issues by developing an unsupervised machine learning approach termed clustering of genomic regions analysis method (CREAM). CREAM automates CORE detection from chromatin accessibility profiles that are enriched in CREs strongly bound by master transcription regulators, proximal to highly expressed and essential genes, and discriminating cell identity. Although COREs share similarities with super-enhancers, we highlight differences in terms of the genomic distribution and structure of these cis-regulatory units. We further show the enhanced value of COREs over super-enhancers to identify master transcription regulators, highly expressed and essential genes defining cell identity. COREs enrich at topologically associated domain (TAD) boundaries. They are also preferentially bound by the chromatin looping factors CTCF and cohesin, in contrast to super-enhancers, forming clusters of CTCF and cohesin binding regions and defining homotypic clusters of transcription regulator binding regions (HCTs). Finally, we show the clinical utility of CREAM to identify COREs across chromatin accessibility profiles to stratify more than 400 tumor samples according to their cancer type and to delineate cancer type–specific active biological pathways. Collectively, our results support the utility of CREAM to delineate COREs underlying, with greater accuracy than individual CREs or super-enhancers, the cell-type–specific biological underpinning across a wide range of normal and cancer cell types.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.248658.119.

    • Freely available online through the Genome Research Open Access option.

    • Received January 19, 2019.
    • Accepted August 14, 2019.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server