Nonrandom Tripeptide Sequence Distributions at Protein Carboxyl Termini

  1. Gregory J. Gatto, Jr. and
  2. Jeremy M. Berg1
  1. Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA

Abstract

The availability of complete genome sequences enables the statistical analysis of sequence features without significant database-imposed bias. The carboxyl termini of proteins often contain regions associated with protein targeting and enhanced translational termination. We analyzed the frequency of occurrence of C-terminal tripeptides in representative archaeal, bacterial, and eukaryotic genomes. The sequence distribution in prokaryotic genomes nearly matches that generated by the randomization of the observed tripeptide set. In contrast, eukaryotic genomes contain large numbers of overrepresented sequences. Some of these correspond to highly repeated sequences from either duplicated endogenous genes or transposon open reading frames. Gratifyingly, others represent previously known targeting signals or sequences associated with an increase in translational termination efficiency. However, a number of overrepresented tripeptides have not been previously noted and may represent novel functional sequences. For example, the sequence XSS may enhance translational termination efficiency in plants, whereas FWC may be a targeting or processing signal for certain amino acid permeases in yeast.

Footnotes

  • 1 Corresponding author.

  • E-MAIL jberg{at}jhmi.edu; FAX (410) 502-6910.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.667603.

    • Received July 29, 2002.
    • Accepted January 28, 2003.
| Table of Contents

Preprint Server