Table 7.

Sensitivity and Specificity of Multiple (2 and 3) Perfect Nucleotide K-mer Matches as a Search Criterion

2,8 2,9 2,10 2,11 2,12 3,8 3,9 3,10 3,11 3,12
A. 81%0.6810.5080.3480.2200.1290.3890.2210.1120.0510.021
83%0.7900.6380.4750.3260.2080.5290.3390.1930.0990.045
85%0.8790.7620.6150.4600.3180.6760.4870.3130.1800.093
87%0.9420.8660.7520.6110.4610.8090.6490.4700.3050.177
89%0.9780.9400.8680.7610.6250.9100.8010.6480.4760.314
91%0.9940.9800.9470.8840.7870.9690.9140.8150.6730.505
93%0.9990.9960.9860.9620.9120.9930.9760.9330.8510.722
95%1.0001.0000.9980.9930.9790.9990.9970.9870.9610.902
97%1.0001.0001.0001.0000.9991.0001.0000.9990.9970.987
B. N,K 2,8 2,9 2,10 2,11 2,12 3,8 3,9 3,10 3,11 3,12
F524271.40.10.00.10.00.00.00.0

[i] (A) Columns are for N sizes of 2 and 3 and K sizes of 8–12. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated by equation 10. (B) N and K represent the number and size of the near-perfect matches, respectively. F shows how many perfect clustered matches expected to occur by chance according to equation 14 in a translated genome of 3 billion bases using a query of 167 amino acids.