Table 8.
Sensitivity and Specificity of Multiple (2 and 3) Perfect Amino Acid K-mer Matches as a Search Criterion
| 2,3 | 2,4 | 2,5 | 2,6 | 2,7 | 3,3 | 3,4 | 3,5 | 3,6 | 3,7 | |
| A. 71% | 0.945 | 0.643 | 0.297 | 0.126 | 0.044 | 0.945 | 0.643 | 0.297 | 0.126 | 0.044 |
| 73% | 0.965 | 0.712 | 0.363 | 0.167 | 0.063 | 0.965 | 0.712 | 0.363 | 0.167 | 0.063 |
| 75% | 0.978 | 0.776 | 0.436 | 0.218 | 0.089 | 0.978 | 0.776 | 0.436 | 0.218 | 0.089 |
| 77% | 0.987 | 0.833 | 0.514 | 0.280 | 0.123 | 0.987 | 0.833 | 0.514 | 0.280 | 0.123 |
| 79% | 0.993 | 0.882 | 0.596 | 0.353 | 0.169 | 0.993 | 0.882 | 0.596 | 0.353 | 0.169 |
| 81% | 0.997 | 0.922 | 0.678 | 0.435 | 0.226 | 0.997 | 0.922 | 0.678 | 0.435 | 0.226 |
| 83% | 0.999 | 0.952 | 0.757 | 0.526 | 0.298 | 0.999 | 0.952 | 0.757 | 0.526 | 0.298 |
| 85% | 0.999 | 0.973 | 0.829 | 0.622 | 0.385 | 0.999 | 0.973 | 0.829 | 0.622 | 0.385 |
| 87% | 1.000 | 0.987 | 0.889 | 0.719 | 0.485 | 1.000 | 0.987 | 0.889 | 0.719 | 0.485 |
| 89% | 1.000 | 0.995 | 0.936 | 0.809 | 0.596 | 1.000 | 0.995 | 0.936 | 0.809 | 0.596 |
| 91% | 1.000 | 0.998 | 0.969 | 0.886 | 0.712 | 1.000 | 0.998 | 0.969 | 0.886 | 0.712 |
| 93% | 1.000 | 1.000 | 0.988 | 0.944 | 0.823 | 1.000 | 1.000 | 0.988 | 0.944 | 0.823 |
| B. N,K | 2,3 | 2,4 | 2,5 | 2,6 | 2,7 | 3,3 | 3,4 | 3,5 | 3,6 | 3,7 |
| F | 171875 | 245 | 0.4 | 0.0 | 0.0 | 708 | 0.0 | 0.0 | 0.0 | 0.0 |
-
(A) Columns are for N sizes of 2 and 3 and K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected. (B) N and K represents the number and size of the perfect matches, respectively. F shows how many perfect clustered matches expected to occur by chance in a translated genome of 3 billion bases using a query of 167 amino acids.











