Table 6.

Sensitivity and Specificity of Single Near-Perfect (One Mismatch Allowed) Amino Acid K-mer Matches as a Search Criterion

4 5 6 7 8 9
A. 71%1.0000.9920.9460.8230.7250.515
73%1.0000.9950.9650.8670.7850.586
75%1.0000.9980.9780.9050.8400.657
77%1.0000.9990.9870.9350.8860.727
79%1.0000.9990.9930.9590.9240.791
81%1.0001.0000.9970.9760.9520.849
83%1.0001.0000.9990.9870.9730.897
85%1.0001.0000.9990.9940.9860.936
87%1.0001.0001.0000.9970.9940.964
89%1.0001.0001.0000.9990.9980.982
91%1.0001.0001.0001.0000.9990.993
93%1.0001.0001.0001.0001.0000.998
B. K 4 5 6 7 8 9
F1.2E+086.0E+063000781498574937

[i] (A) Columns are for K sizes of 4–9. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected. (B) K represents the size of the near-perfect match. F shows how many perfect matches of this size expected to occur by chance in a translated genome of 3 billion bases using a query of 167 amino acids.