Table 4.

Sensitivity and Specificity of Single Perfect Amino Acid K-mer Matches as a Search Criterion

K 3 4 5 6 7
A. 71%0.9920.9040.6970.4960.317
73%0.9960.9310.7520.5600.374
75%0.9980.9520.8030.6250.436
77%0.9990.9690.8500.6890.503
79%0.9990.9810.8900.7520.574
81%1.0000.9890.9240.8100.646
83%1.0000.9940.9500.8620.718
85%1.0000.9970.9700.9060.787
87%1.0000.9990.9840.9420.850
89%1.0001.0000.9930.9680.903
91%1.0001.0000.9970.9850.945
93%1.0001.0000.9990.9950.975
B. K 3 4 5 6 7
F4.2e+071.6e+06626252609112

[i] (A) Columns are for K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated from equation 3 assuming a homologous region of 33 amino acids. (B) K represents the size of the perfect match. F shows how many perfect matches of this size are expected to occur by chance according to equation 4 in a translated genome of 3 billion bases using a query of 167 amino acids (corresponding to 500 bases).