Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: Comparison with high-throughput experimental data

  1. Sailu Yellaboina1,
  2. Kshama Goyal1, and
  3. Shekhar C. Mande2
  1. Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500076, India
  1. 1 These authors contributed equally to this work.

Abstract

Cellular functions are determined by interactions among proteins in the cells. Recognition of these interactions forms an important step in understanding biology at the systems level. Here, we report an interaction network of Escherichia coli, obtained by training a Support Vector Machine on the high quality of interactions in the EcoCyc database, and with the assumption that the periplasmic and cytoplasmic proteins may not interact with each other. The data features included correlation coefficient between bit score phylogenetic profiles, frequency of their co-occurrence in predicted operons, and a new measure—the distance between translational start sites of the genes. The combined genome context methods show a high accuracy of prediction on the test data and predict a total of 78,122 binary interactions. The majority of the interactions identified by high-throughput experimental methods correspond to indirect interaction (interactions through neighbors) in the predicted network. Correlation of the predicted network with the gene essentiality data shows that the essential genes in E. coli exhibit a high linking number, whereas the nonessential genes exhibit a low linking number. Furthermore, our predicted protein–protein interaction network shows that the proteins involved in replication, DNA repair, transcription, translation, and cell wall synthesis are highly connected. We therefore believe that our predicted network will serve as a useful resource in understanding prokaryotic biology.

Footnotes

  • 2 Corresponding author.

    2 E-mail shekhar{at}cdfd.org.in; fax 91-40-27155610.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are online at http://www.genome.org/cgi/doi/10.1101/gr.5900607

    • Received August 27, 2006.
    • Accepted December 20, 2006.
  • Freely available online through the Genome Research Open Access option.

OPEN ACCESS ARTICLE

Preprint Server