Predicting Protein Cellular Localization Using a Domain Projection Method

  1. Richard Mott1,5,
  2. Jörg Schultz2,3,
  3. Peer Bork3, and
  4. Chris P. Ponting4
  1. 1Wellcome Trust Centre for Human Genetics, Oxford OX3 7BN, United Kingdom; 2Max-Planck-Institute for Molecular Genetics, 14195 Berlin, Germany; 3European Molecular Biology Laboratory, 69012 Heidelberg, Germany, and Max Delbruk Centrum Berlin-Buch, 13092 Berlin, Germany; 4Medical Research Council Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom

Abstract

We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a “small-world network”, linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of “bridging” domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment (“locale”), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.

Footnotes

  • 5 Corresponding author.

  • E-MAIL rmott{at}well.ox.ac.uk; FAX +44 1865 287664.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.96802.

    • Received January 16, 2002.
    • Accepted May 15, 2002.
| Table of Contents

Preprint Server