Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships

  1. Robert H. Waterston2
  1. 1Department of Genetics, Yale University, New Haven, Connecticut 06520, USA;
  2. 2Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;
  3. 3Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota 55455, USA;
  4. 4Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA;
  5. 5Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA;
  6. 6Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
  7. 7Department of Biology, Howard University, Washington, District of Columbia 20059, USA;
  8. 8Center for Applied Data Science and Analytics, Howard University, Washington, District of Columbia 20059, USA;
  9. 9Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  10. 10Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois 60637, USA;
  11. 11Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA;
  12. 12Department of Biochemistry and Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore;
  13. 13Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
  • Corresponding authors: watersto{at}uw.edu, valerie.reinke{at}yale.edu
  • Abstract

    A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here, we present the culmination of the efforts of the modENCODE (model organism Encyclopedia of DNA Elements) and modERN (model organism Encyclopedia of Regulatory Networks) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These data sets comprise 605 TFs identifying 3.6 M sites in the fly and 356 TFs identifying 0.9 M sites in the worm, and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed “metapeaks,” that larger metapeaks have characteristics of high-occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single-cell RNA-seq data in a machine-learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent–daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing green fluorescent protein (GFP)-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell-type-specific TF–target relationships.

    Footnotes

    • Received January 25, 2024.
    • Accepted October 17, 2024.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server