Integrating genetic variation with deep learning provides context for variants impacting transcription factor binding during embryogenesis
- Olga M. Sigalova1,5,6,
- Mattia Forneris1,5,
- Frosina Stojanovska2,3,
- Bingqing Zhao1,7,
- Rebecca R. Viales1,
- Adam Rabinowitz1,
- Fayrouz Hammal4,
- Benoît Ballester4,
- Judith B. Zaugg2 and
- Eileen E.M. Furlong1
- 1European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany;
- 2European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, D-69117 Heidelberg, Germany;
- 3Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Faculty of Biosciences, D-69117 Heidelberg, Germany;
- 4Aix Marseille Univ, INSERM, TAGC, 13009 Marseille, France
-
↵5 These authors contributed equally to this work.
Abstract
Understanding how genetic variation impacts transcription factor (TF) binding remains a major challenge, limiting our ability to model disease-associated variants. Here, we used a highly controlled system of F1 crosses with extensive genetic diversity to profile allele-specific binding of four TFs at several time points during Drosophila embryogenesis. Using a combined haplotype test, we identified 9%–18% of TF-bound regions impacted by genetic variation even for essential regulators. By expanding WASP (a tool for allele-specific read mapping) to examine indels, we increased detection of allelically imbalanced peaks by 30%–50%. This fine-grained “mutagenesis” can reconstruct functionalized binding motifs for all factors. To prioritize causal variants, we trained a convolutional neural network (Basenji) to accurately predict binding from DNA sequence. The model can also predict measured allelic imbalance for strong effect variants, providing a mechanistic interpretation for how the variant impacts binding. This reveals unexpected relationships between TFs, including potential cooperative pairs, and mechanisms of tissue-specific recruitment of the ubiquitous factor CTCF.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279652.124.
-
Freely available online through the Genome Research Open Access option.
- Received June 3, 2024.
- Accepted February 20, 2025.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.











