Table 2.

Most Frequent Tripeptide Sequences Observed Within the Genomes Studied

Organism N(Occ) N(Seq) (expected) N(Seq) (observed) Sequences observed
M. jannaschii 120.1 ± 0.32KEE (4.0), LKK (1.8)
 (1773 ORFs)100.4 ± 0.61KKL (1.9)
81.4 ± 1.12 KKK (1.0), LKE (1.6)
72.7 ± 1.64IIK (2.1), KKE (1.1), LNK (3.0),RLL (4.8)
65.4 ± 2.29EKE (1.6), EKL (1.8), IKK (1.0), KIE (1.8), KKD (3.3), KKI (1.3),RKK (1.8), VKE (2.6), VKK (2.0)
E. coli 110.0 ± 0.21 AKK (3.5)
 (4290 ORFs)100.1 ± 0.33 KKK (3.5),RSH (9.7), RSR(4.8)
90.4 ± 0.62EAK (2.5), RLK(2.5)
81.2 ± 1.17AAQ (3.9), EEA (3.5), EVK (3.3), GLL (4.3), LEI (8.0), LLG (2.9), RRG (4.7)
73.6 ± 1.88 DGE (7.4), EEV (6.5), GGK (4.1), KLA (2.4), LAS (2.8), NLA (4.0), RRR (3.1), SEE (5.4)
S. cerevisiae 210.0 ± 0.01 SKK(3.3)
 (6215 ORFs)190.0 ± 0.01 KKK(2.8)
170.0 ± 0.11 VGE(16.5)
160.0 ± 0.12 AKK (4.3),WIH (120.2)
150.0 ± 0.22 DEL(7.3), IAN(12.0)
130.1 ± 0.31 SKL(2.2)
120.3 ± 0.54 EKK (1.5),LKK (1.5), LLL (1.9), LSK (1.9)
110.6 ± 0.72KKE (2.9), LLK (1.6)
101.3 ± 1.11 GKK(2.8)
92.9 ± 1.77DEE (7.1), DSK (2.7), FWC (158.6), LSI (2.3), MLL (6.5), QKI (4.6), SSS (3.0)
86.1 ± 2.412DDE (7.0), EVD(8.8), IPK (5.8), KEK (2.2), KKD (2.6), KKN (1.9), LDL (2.3), LLV (2.6), RRK (3.5), SLA (3.2), SSL (1.7), TKK(2.2)
A. thaliana 990.0 ± 0.01 SSS(3.4)
 (25561 ORFs)540.0 ± 0.01 DYW(95.3)
430.0 ± 0.11SSL (1.4)
400.0 ± 0.21 ASS(2.6)
390.1 ± 0.22 DEL (5.2),TSS (2.6)
380.1 ± 0.31 SKL(1.8)
360.2 ± 0.41LKL (1.8)
340.2 ± 0.51LLS (1.6)
320.3 ± 0.61EEE (8.2)
310.4 ± 0.61SST (2.3)
300.5 ± 0.72 LSS (1.1), STS (2.0)
290.6 ± 0.84KKK (3.4), LLL (1.3),PSS (2.1), RRR (3.8)
280.8 ± 0.92SSI (1.8), VSS (1.7)
261.2 ± 1.13DSD (4.3), GSS (1.9), LVF (3.2)
251.6 ± 1.32DEE (6.7), KKR (2.7)
241.9 ± 1.34FLL (2.1), FSS(1.7), LSL (0.9), RRS (2.1)
232.5 ± 1.58DDE (7.5), EED (6.8), SFL (1.7), SLL (1.0), SSR (1.2), SSV (1.1), VSA (2.3), VTL (2.7)
C. elegans 700.0 ± 0.01 KKK (4.6)
 (19833 ORFs)450.0 ± 0.01 LCE(20.4)
380.0 ± 0.02 SKL (2.3),YNP (33.7)
360.0 ± 0.01 PGY(20.8)
320.0 ± 0.02 GKK (4.3),TKY (5.8)
300.0 ± 0.01 SSK(2.2)
280.0 ± 0.13DDE (11.5), KKN (2.2),SKK (1.8)
260.1 ± 0.22DSD (7.7), RRK (3.8)
240.2 ± 0.45 AKK (2.8), DEE (8.4), KKL (1.5), KRK (2.4), LKK(1.6)
230.3 ± 0.55 AKL (2.6),DEL (4.9), GRK (4.6), KKE (2.3), KKI (2.1)
220.4 ± 0.63 EKK (2.3), SKN (1.6), TNS(3.9)
210.7 ± 0.81 TRR(5.4)
201.1 ± 1.14 ERA (5.3), KKQ (2.3), RKL (1.8), RRR (5.7)
191.8 ± 1.37 DKE (3.6), FGK (4.3), INY (5.4), LGL (2.8), NKK (3.1), SSF (1.5), VSS (2.9)
182.6 ± 1.59EKL (1.8),FGG (12.2), KSE (2.1), LFN (2.5), LKI (1.6), RIC (9.5), SRR (3.3), SSS (1.8), VKK (1.8)
H. sapiens 320.0 ± 0.01 DEL (6.3)
 (14760 ORFs)310.0 ± 0.01 EKK(5.3)
280.0 ± 0.01 KKK(4.5)
250.0 ± 0.11 LKF(5.1)
220.1 ± 0.31EEE (6.3)
210.2 ± 0.42LLL (1.6), SDQ(6.0)
200.3 ± 0.52LAL (2.2), SSK(1.9)
190.4 ± 0.63 EEL (2.5), LLK (2.2), WNK (28.0)
180.7 ± 0.83ASS (2.1), TRL (2.7), TSL (1.8)
171.0 ± 1.06 KGK(3.4), KRK (3.3), LGL (1.6), LLS (1.6), RKK (3.5), SLL (1.2)
161.6 ± 1.35EDD (7.1), RRR (5.8), SES (1.7), SKL (1.2), TEL (2.2)
152.7 ± 1.69GSS (1.9), KRR (4.2),NKI (8.5), PSS (1.8), RRK (3.8), SSL (1.0), SSS (1.2), TKL (1.8), TVV (5.0)
144.2 ± 2.09APL (2.2), EKP (3.2), ERA (4.1), GKK (2.6), KSS (1.5), LVS (2.2), PGP (4.4), SCC (11.1), TEV (3.3)
136.5 ± 2.413 AKL (1.6),CGF (12.8), DSD (4.7), DTM (18.3), EDL (2.3), KKN (3.9), LEA (2.6), PPQ (4.8), SHL (2.8), SSP (1.7), SVS (1.9), TSI (3.3), VSS (2.0)
1210.1 ± 3.020AAS (2.2), EED (3.8), EKL (1.4), EVD (5.4), FGG(9.4), KAK (2.5), LKL (1.0), LPQ (3.0), LSL (0.9), LSS (1.0), PAS (2.3), QGL (2.6), RPY (7.6), SEI (2.6), SLS (1.0), SLT (2.0), SSV (1.4), TAL (1.9), TTV (3.8), VLL (1.7)

[i] For each organism, the number of ORFs used for the analysis is indicated. N(Occ) indicates the number of occurrences for a particular sequence in the genome; N(Seq) indicates the number of sequences that appear N(Occ) number of times. The expected value of N(Seq) is derived from the genome jumbling method (with uncertainties shown at one standard deviation). Values in parentheses accompanying each sequence refer to the ratio of the number of times that sequence is observed to the number of times that sequence is expected based on positional amino acid frequencies. Sequences in boldface are known recognition motifs; italicized sequences belong to, entirely or in part, highly repeated sequences (e.g., homologous proteins or transposon ORFs), and underlined sequences take the form XKK (XSS in A. thaliana).