Cross-species analysis of enhancer logic using deep learning

  1. Stein Aerts1,2
  1. 1VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium;
  2. 2KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium;
  3. 3Howard Hughes Medical Institute, Stem Cell Program and the Division of Pediatric Hematology/Oncology, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA;
  4. 4Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Cambridge, Massachusetts 02138, USA;
  5. 5Laboratory for Disease Mechanisms in Cancer, KU Leuven, 3000 Leuven, Belgium;
  6. 6Center for Forensic Medicine, Medical University of Vienna, 1090 Vienna, Austria;
  7. 7Division of Livestock Sciences (NUWI) - BOKU University of Natural Resources and Life Sciences, 1180 Vienna, Austria;
  8. 8VIB-KU Leuven Center for Cancer Biology, 3000 Leuven, Belgium;
  9. 9KU Leuven, Department of Oncology KU Leuven, 3000 Leuven, Belgium;
  10. 10CNRS-University of Rennes 1, UMR6290, Institute of Genetics and Development of Rennes, Faculty of Medicine, 35000 Rennes, France;
  11. 11Université Paris-Saclay, INRA, AgroParisTech, GABI, 78350 Jouy-en-Josas, France;
  12. 12Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium
  1. 13 These authors contributed equally to this work.

  • Corresponding author: stein.aerts{at}kuleuven.vib.be
  • Abstract

    Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type–specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4. Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.260844.120.

    • Freely available online through the Genome Research Open Access option.

    • Received January 30, 2020.
    • Accepted June 15, 2020.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    Articles citing this article

    Related Article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server