The genetic code is nearly optimal for allowing additional information within protein-coding sequences

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

(A) Calculation of the probability that an n-mer sequence appears within a protein-coding region in the real genetic code. The 5-mer sequence S = UGACA can appear in one of the three reading frames. For each reading frame, the probabilities of all three codon combinations that contain S are summed up. Codon combinations with an in-frame stop (such as UGA) do not contribute to the n-mer probability since they cannot appear in a coding region. Vertical lines separate consecutive codons, stop codons are in red, P0, P−1, P+1 denote the probabilities of encountering S in the 0/−1/+1 frame. (B,C,D) Three examples of “difficult” n-mers in the real code and in alternative codes. (B) The 5-mer UGACA, which includes the stop codon UGA, can appear in a protein-coding sequence with the real genetic code in only two of the three possible reading frames (+1 and −1 frames). (C) In the alternative code shown in Figure 3D, whose stop codon AAA overlaps with itself, the 5-mer AAAAA cannot appear in a protein-coding sequence in any of the three reading frames. (D) In an alternative code with the overlapping stop codons CCG and CGG, the 5-mer CCGGU can only appear in one reading frame. The 5-mers are in bold text, stop codons are in red, N denotes any DNA letter, green v denotes a frame in which the n-mer can appear, red x denotes a frame in which the n-mer cannot appear. (E) Distribution of the probabilities of all 6-mers in the real code (bold black line) and in the alternative codes (light blue lines). The x-axis is the probability of obtaining 6-mers within protein-coding sequences; the y-axis is the number of 6-mers with this probability. In the real code there are significantly less “difficult” 6-mers (with low probabilities), relative to the alternative codes. (F) The fraction of n-mers that have a higher probability in the real code than in alternative codes increases with n-mer size. The y-axis shows the fraction of n-mers for which the average probability of appearing in the real genetic code is significantly higher than in the alternative codes.

This Article

  1. Genome Res. 17: 405-412

Preprint Server