Integrating genetic variation with deep learning provides context for variants impacting transcription factor binding during embryogenesis

  1. Eileen E.M. Furlong1
  1. 1European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany;
  2. 2European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, D-69117 Heidelberg, Germany;
  3. 3Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Faculty of Biosciences, D-69117 Heidelberg, Germany;
  4. 4Aix Marseille Univ, INSERM, TAGC, 13009 Marseille, France
  1. 5 These authors contributed equally to this work.

  • Present addresses: 6VIB.AI Center for AI & Computational Biology, 3000 Leuven, Belgium; 7Department of Genetics, Stanford University, Stanford, CA 94305, USA

  • Corresponding authors: judith.zaugg{at}embl.de, furlong{at}embl.de
  • Abstract

    Understanding how genetic variation impacts transcription factor (TF) binding remains a major challenge, limiting our ability to model disease-associated variants. Here, we used a highly controlled system of F1 crosses with extensive genetic diversity to profile allele-specific binding of four TFs at several time points during Drosophila embryogenesis. Using a combined haplotype test, we identified 9%–18% of TF-bound regions impacted by genetic variation even for essential regulators. By expanding WASP (a tool for allele-specific read mapping) to examine indels, we increased detection of allelically imbalanced peaks by 30%–50%. This fine-grained “mutagenesis” can reconstruct functionalized binding motifs for all factors. To prioritize causal variants, we trained a convolutional neural network (Basenji) to accurately predict binding from DNA sequence. The model can also predict measured allelic imbalance for strong effect variants, providing a mechanistic interpretation for how the variant impacts binding. This reveals unexpected relationships between TFs, including potential cooperative pairs, and mechanisms of tissue-specific recruitment of the ubiquitous factor CTCF.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279652.124.

    • Freely available online through the Genome Research Open Access option.

    • Received June 3, 2024.
    • Accepted February 20, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    This Article

    1. Genome Res. 35: 1138-1153 © 2025 Sigalova et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server