Cross-species analysis of enhancer logic using deep learning

Liesbeth Minnoye; Ibrahim Ihsan Taskiran; David Mauduit; Maurizio Fazio; Linde Van Aerschot; Gert Hulselmans; Valerie Christiaens; Samira Makhzami; Monika Seltenhammer; Panagiotis Karras; Aline Primot; Edouard Cadieu; Ellen van Rooijen; Jean-Christophe Marine; Giorgia Egidy; Ghanem-Elias Ghanem; Leonard Zon; Jasper Wouters; Stein Aerts

doi:10.1101/gr.260844.120

Cross-species analysis of enhancer logic using deep learning

¹VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium;
²KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium;
³Howard Hughes Medical Institute, Stem Cell Program and the Division of Pediatric Hematology/Oncology, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA;
⁴Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Cambridge, Massachusetts 02138, USA;
⁵Laboratory for Disease Mechanisms in Cancer, KU Leuven, 3000 Leuven, Belgium;
⁶Center for Forensic Medicine, Medical University of Vienna, 1090 Vienna, Austria;
⁷Division of Livestock Sciences (NUWI) - BOKU University of Natural Resources and Life Sciences, 1180 Vienna, Austria;
⁸VIB-KU Leuven Center for Cancer Biology, 3000 Leuven, Belgium;
⁹KU Leuven, Department of Oncology KU Leuven, 3000 Leuven, Belgium;
¹⁰CNRS-University of Rennes 1, UMR6290, Institute of Genetics and Development of Rennes, Faculty of Medicine, 35000 Rennes, France;
¹¹Université Paris-Saclay, INRA, AgroParisTech, GABI, 78350 Jouy-en-Josas, France;
¹²Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium

↵13 These authors contributed equally to this work.

Corresponding author: stein.aerts{at}kuleuven.vib.be

Abstract

Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type–specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4. Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.260844.120.
Freely available online through the Genome Research Open Access option.

Received January 30, 2020.
Accepted June 15, 2020.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

Articles citing this article

Multi-omic profiling of human and mouse dorsal root ganglia enables targeted gene delivery to nociceptors bioRxiv March 14, 2026 0: 2026.03.05.709931v2-2026.03.05.709931

Leveraging human-trained neural networks for cross-species chromatin regulation annotations bioRxiv March 1, 2026 0: 2025.10.16.682871v2-2025.10.16.682871

System-wide extraction of cis-regulatory rules from sequence-to-function models in human neural development bioRxiv January 16, 2026 0: 2026.01.14.699402v1-2026.01.14.699402

Positional Interpretation of Cis-Regulatory Code and Nucleosome Organization with Deep Learning Models bioRxiv January 8, 2026 0: 2025.04.07.647613v3-2025.04.07.647613

Predictive models of the genetic bases underlying budding yeast fitness in multiple environments bioRxiv October 23, 2025 0: 2025.10.20.683436v1-2025.10.20.683436

Cross-Species Prediction of Histone Modifications in Plants via Deep Learning bioRxiv May 27, 2025 0: 2025.05.19.655006v1-2025.05.19.655006

Systematic contextual biases in SegmentNT relevant to all nucleotide transformer models bioRxiv April 17, 2025 0: 2025.04.09.647946v1-2025.04.09.647946

CREsted: modeling genomic and synthetic cell type-specific enhancers across tissues and species bioRxiv April 7, 2025 0: 2025.04.02.646812v1-2025.04.02.646812

Single-cell analysis of the epigenome and 3D chromatin architecture in the human retina bioRxiv April 4, 2025 0: 2024.12.28.630634v2-2024.12.28.630634

Annotating the genome at single-nucleotide resolution with DNA foundation models bioRxiv March 29, 2025 0: 2024.03.14.584712v4-2024.03.14.584712

Evaluating Methods for the Prediction of Cell Type-Specific Enhancers in the Mammalian Cortex bioRxiv March 27, 2025 0: 2024.08.21.609075v3-2024.08.21.609075

The evolution of gene regulation in mammalian cerebellum development bioRxiv March 19, 2025 0: 2025.03.14.643248v1-2025.03.14.643248

Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart bioRxiv February 22, 2025 0: 2025.02.18.638922v2-2025.02.18.638922

ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants bioRxiv January 10, 2025 0: 2024.12.25.630221v2-2024.12.25.630221

A comprehensive benchmark and guide for sequence-function interpretable deep learning models in genomics bioRxiv January 9, 2025 0: 2025.01.06.631405v1-2025.01.06.631405

Decoding biology with massively parallel reporter assays and machine learning Genes Dev. September 1, 2024 38: 843-865

Large-scale genomic analysis of the domestic dog informs biological discovery Genome Res June 1, 2024 34: 811-821

Conservation of Regulatory Elements with Highly Diverged Sequences Across Large Evolutionary Distances bioRxiv May 17, 2024 0: 2024.05.13.590087v1-2024.05.13.590087

Cross-species prediction of transcription factor binding by adversarial training of a novel nucleotide-level deep neural network bioRxiv February 11, 2024 0: 2024.02.06.579242v1-2024.02.06.579242

Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation bioRxiv January 21, 2024 0: 2023.09.15.558005v2-2023.09.15.558005

A Bag-Of-Motif Model Captures Cell States at Distal Regulatory Sequences bioRxiv January 4, 2024 0: 2024.01.03.574012v1-2024.01.03.574012

Identification of transcription factor co-binding patterns with non-negative matrix factorization bioRxiv May 3, 2023 0: 2023.04.28.538684v1-2023.04.28.538684

Comparative single cell epigenomic analysis of gene regulatory programs in the rodent and primate neocortex bioRxiv April 12, 2023 0: 2023.04.08.536119v1-2023.04.08.536119

Chromatin accessibility is a two-tier process regulated by transcription factor pioneering and enhancer activation bioRxiv December 22, 2022 0: 2022.12.20.520743v1-2022.12.20.520743

Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters bioRxiv December 12, 2022 0: 2022.12.10.519236v1-2022.12.10.519236

Enhancer grammar of liver cell types and hepatocyte zonation states bioRxiv December 11, 2022 0: 2022.12.08.519575v1-2022.12.08.519575

A comparative atlas of single-cell chromatin accessibility in the human brain bioRxiv November 12, 2022 0: 2022.11.09.515833v1-2022.11.09.515833

EUGENe: A Python toolkit for predictive analyses of regulatory sequences bioRxiv November 11, 2022 0: 2022.10.24.513593v2-2022.10.24.513593

Evaluating deep learning for predicting epigenomic profiles bioRxiv October 10, 2022 0: 2022.04.29.490059v2-2022.04.29.490059

Relating enhancer genetic variation across mammals to complex phenotypes using machine learning bioRxiv August 29, 2022 0: 2022.08.26.505436v1-2022.08.26.505436

SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks bioRxiv August 22, 2022 0: 2022.08.19.504505v1-2022.08.19.504505

Cell type directed design of synthetic enhancers bioRxiv July 29, 2022 0: 2022.07.26.501466v1-2022.07.26.501466

maxATAC: genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks bioRxiv July 8, 2022 0: 2022.01.28.478235v2-2022.01.28.478235

AdaLiftOver: High-resolution identification of orthologous regulatory elements with adaptive liftOver bioRxiv June 6, 2022 0: 2022.06.03.494721v1-2022.06.03.494721

Addiction-Associated Genetic Variants Implicate Brain Cell Type- and Region-Specific Cis-Regulatory Elements in Addiction Neurobiology J. Neurosci. October 27, 2021 41: 9008-9030

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of enhancers bioRxiv October 9, 2021 0: 2021.10.05.463203v1-2021.10.05.463203

AP-2{alpha}-Mediated Activation of E2F and EZH2 Drives Melanoma Metastasis Cancer Res. September 1, 2021 81: 4455-4470

Decoding gene regulation in the fly brain bioRxiv August 14, 2021 0: 2021.08.11.454937v1-2021.08.11.454937

Analysis of long and short enhancers in melanoma cell states bioRxiv July 29, 2021 0: 2021.07.27.453936v1-2021.07.27.453936

Designing Interpretable Convolution-Based Hybrid Networks for Genomics bioRxiv July 15, 2021 0: 2021.07.13.452181v1-2021.07.13.452181

Interpretation of allele-specific chromatin accessibility using cell state-aware deep learning Genome Res June 1, 2021 31: 1082-1096

Predicting lineage-specific differences in open chromatin across dozens of mammalian genomes bioRxiv May 11, 2021 0: 2020.12.04.410795v2-2020.12.04.410795

Machine learning sequence prioritization for cell type-specific enhancer design bioRxiv April 17, 2021 0: 2021.04.15.439984v1-2021.04.15.439984

Cross-species analysis of enhancer logic using deep learning

Abstract

Footnotes

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Related Content

Share

Preprint Server

Current Issue

In This Issue

Cross-species analysis of enhancer logic using deep learning

Abstract

Footnotes

Articles citing this article

Related Article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Related Content

Share

Preprint Server

Current Issue

In This Issue