Visualizing Sequence Similarity of Protein Families

  1. Vamsi Veeramachaneni and
  2. Wojciech Makałowski1
  1. Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA

Abstract

Classification of proteins into families is one of the main goals of functional analysis. Proteins are usually assigned to a family on the basis of the presence of family-specific patterns, domains, or structural elements. Whereas proteins belonging to the same family are generally similar to each other, the extent of similarity varies widely across families. Some families are characterized by short, well-defined motifs, whereas others contain longer, less-specific motifs. We present a simple method for visualizing such differences. We applied our method to the Arabidopsis thaliana families listed at The Arabidopsis Information Resource (TAIR) Web site and for 76% of the nontrivial families (families with more than one member), our method identifies simple similarity measures that are necessary and sufficient to cluster members of the family together. Our visualization method can be used as part of an annotation pipeline to identify potentially incorrectly defined families. We also describe how our method can be extended to identify novel families and to assign unclassified proteins into known families.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2079204. Article published online before print in May 2004.

  • 1 Corresponding author. E-MAIL wojtek{at}psu.edu; FAX (814) 865-9366.

    • Accepted February 10, 2004.
    • Received October 16, 2003.

Articles citing this article

| Table of Contents

Preprint Server