Discovering regulatory binding-site modules using rule-based learning

  1. Torgeir R. Hvidsten1,
  2. Bartosz Wilczyński2,3,
  3. Andriy Kryshtafovych2,
  4. Jerzy Tiuryn4,
  5. Jan Komorowski1,5, and
  6. Krzysztof Fidelis2,5
  1. 1 The Linnaeus Centre for Bioinformatics, Uppsala University, 751 24 Uppsala, Sweden
  2. 2 Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
  3. 3 Institute of Mathematics of the Polish Academy of Sciences, 00-950 Warsaw, Poland
  4. 4 Faculty of Mathematics, Informatics, and Mechanics, Warsaw University, 02-097 Warsaw, Poland

Abstract

Transcription factors regulate expression by binding selectively to sequence sites in cis-regulatory regions of genes. It is therefore reasonable to assume that genes regulated by the same transcription factors should all contain the corresponding binding sites in their regulatory regions and exhibit similar expression profiles as measured by, for example, microarray technology. We have used this assumption to analyze genome-wide yeast binding-site and microarray expression data to reveal the combinatorial nature of gene regulation. We obtained IF-THEN rules linking binding-site combinations (binding-site modules) to genes with particular expression profiles, and thereby provided testable hypotheses on the combinatorial coregulation of gene expression. We showed that genes associated with such rules have a significantly higher probability of being bound by the same transcription factors, as indicated by a genome-wide location analysis, than genes associated with only common binding sites or similar expression. Furthermore, we also found that such genes were significantly more often biologically related in terms of Gene Ontology annotations than genes only associated with common binding sites or similar expression. We analyzed expression data collected under different sets of stress conditions and found many binding-site modules that are conserved over several of these condition sets, as well as modules that are specific to particular biological responses. Our results on the reoccurrence of binding sites in different modules provide specific data on how binding sites may be combined to allow a large number of expression outcomes using relatively few transcription factors.

Footnotes

  • [Supplemental material is available online at www.genome.org and http://www.lcb.uu.se/~vidsen/binding_sites/.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3760605.

  • 6 Some rules may not survive the expression similarity filtering when additional genes are added or may merge with larger rules including both the previous known binding-site module composed of known binding sites and the additional putative sites.

  • 7 Note that GLK1 is not included in the list of genes matching the rule because it was filtered out owing to missing data.

  • 5 Corresponding authors. E-mail fidelis{at}llnl.gov; fax (925) 424-6605. E-mail jan.komorowski{at}lcb.uu.se; fax 46 18 471 66 98.

    • Accepted March 22, 2005.
    • Received January 27, 2005.
| Table of Contents

Preprint Server