Table 6.
Sensitivity and Specificity of Single Near-Perfect (One Mismatch Allowed) Amino Acid K-mer Matches as a Search Criterion
| 4 | 5 | 6 | 7 | 8 | 9 | |
| A. 71% | 1.000 | 0.992 | 0.946 | 0.823 | 0.725 | 0.515 |
| 73% | 1.000 | 0.995 | 0.965 | 0.867 | 0.785 | 0.586 |
| 75% | 1.000 | 0.998 | 0.978 | 0.905 | 0.840 | 0.657 |
| 77% | 1.000 | 0.999 | 0.987 | 0.935 | 0.886 | 0.727 |
| 79% | 1.000 | 0.999 | 0.993 | 0.959 | 0.924 | 0.791 |
| 81% | 1.000 | 1.000 | 0.997 | 0.976 | 0.952 | 0.849 |
| 83% | 1.000 | 1.000 | 0.999 | 0.987 | 0.973 | 0.897 |
| 85% | 1.000 | 1.000 | 0.999 | 0.994 | 0.986 | 0.936 |
| 87% | 1.000 | 1.000 | 1.000 | 0.997 | 0.994 | 0.964 |
| 89% | 1.000 | 1.000 | 1.000 | 0.999 | 0.998 | 0.982 |
| 91% | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 0.993 |
| 93% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.998 |
| B. K | 4 | 5 | 6 | 7 | 8 | 9 |
| F | 1.2E+08 | 6.0E+06 | 300078 | 14985 | 749 | 37 |
-
(A) Columns are for K sizes of 4–9. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected. (B) K represents the size of the near-perfect match. F shows how many perfect matches of this size expected to occur by chance in a translated genome of 3 billion bases using a query of 167 amino acids.











