Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships
- Michelle Kudron1,
- Louis Gewirtzman2,
- Alec Victorsen3,
- Bridget C Lear4,
- Dionne Vafeados5,
- Jiahao Gao6,
- Jinrui Xu7,
- Swapna Samanta6,
- Emily Frink2,
- Adri Tran-Pearson2,
- Chau Hyunh2,
- Ann Hammonds8,
- William Fisher8,
- Martha L Wall9,
- Greg Wesseling4,
- Vanessa Hernandez4,
- Zhichun Lin4,
- Mary Kasparian4,
- Kevin P White10,
- Ravi Allada4,
- Mark Gerstein6,
- LaDeana Hillier2,
- Susan E Celniker8,
- Valerie Reinke6,11 and
- Robert Waterston2
- 1 Yale University School of Medicine;
- 2 University of Washington School of Medicine;
- 3 University of Minnesota;
- 4 Northwestern University;
- 5 University of Washington;
- 6 Yale University;
- 7 Howard University;
- 8 Lawrence Berkeley National Laboratory;
- 9 University of Chicago;
- 10 National University of Singapore
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.
- Received January 25, 2024.
- Accepted October 17, 2024.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











