Sensitivity and Specificity of Single Perfect Amino Acid K-mer Matches as a Search Criterion
| K | 3 | 4 | 5 | 6 | 7 |
| A. 71% | 0.992 | 0.904 | 0.697 | 0.496 | 0.317 |
| 73% | 0.996 | 0.931 | 0.752 | 0.560 | 0.374 |
| 75% | 0.998 | 0.952 | 0.803 | 0.625 | 0.436 |
| 77% | 0.999 | 0.969 | 0.850 | 0.689 | 0.503 |
| 79% | 0.999 | 0.981 | 0.890 | 0.752 | 0.574 |
| 81% | 1.000 | 0.989 | 0.924 | 0.810 | 0.646 |
| 83% | 1.000 | 0.994 | 0.950 | 0.862 | 0.718 |
| 85% | 1.000 | 0.997 | 0.970 | 0.906 | 0.787 |
| 87% | 1.000 | 0.999 | 0.984 | 0.942 | 0.850 |
| 89% | 1.000 | 1.000 | 0.993 | 0.968 | 0.903 |
| 91% | 1.000 | 1.000 | 0.997 | 0.985 | 0.945 |
| 93% | 1.000 | 1.000 | 0.999 | 0.995 | 0.975 |
| B. K | 3 | 4 | 5 | 6 | 7 |
| F | 4.2e+07 | 1.6e+06 | 62625 | 2609 | 112 |
-
(A) Columns are for K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated from equation 3 assuming a homologous region of 33 amino acids. (B) K represents the size of the perfect match. F shows how many perfect matches of this size are expected to occur by chance according to equation 4 in a translated genome of 3 billion bases using a query of 167 amino acids (corresponding to 500 bases).











