Table 8.

Sensitivity and Specificity of Multiple (2 and 3) Perfect Amino Acid K-mer Matches as a Search Criterion

2,3 2,4 2,5 2,6 2,7 3,3 3,4 3,5 3,6 3,7
A. 71%0.9450.6430.2970.1260.0440.9450.6430.2970.1260.044
73%0.9650.7120.3630.1670.0630.9650.7120.3630.1670.063
75%0.9780.7760.4360.2180.0890.9780.7760.4360.2180.089
77%0.9870.8330.5140.2800.1230.9870.8330.5140.2800.123
79%0.9930.8820.5960.3530.1690.9930.8820.5960.3530.169
81%0.9970.9220.6780.4350.2260.9970.9220.6780.4350.226
83%0.9990.9520.7570.5260.2980.9990.9520.7570.5260.298
85%0.9990.9730.8290.6220.3850.9990.9730.8290.6220.385
87%1.0000.9870.8890.7190.4851.0000.9870.8890.7190.485
89%1.0000.9950.9360.8090.5961.0000.9950.9360.8090.596
91%1.0000.9980.9690.8860.7121.0000.9980.9690.8860.712
93%1.0001.0000.9880.9440.8231.0001.0000.9880.9440.823
B. N,K 2,3 2,4 2,5 2,6 2,7 3,3 3,4 3,5 3,6 3,7
F1718752450.40.00.07080.00.00.00.0

[i] (A) Columns are for N sizes of 2 and 3 and K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected. (B) N and K represents the number and size of the perfect matches, respectively. F shows how many perfect clustered matches expected to occur by chance in a translated genome of 3 billion bases using a query of 167 amino acids.