Table 5.

Sensitivity and Specificity of Single Near-Perfect (One Mismatch Allowed) Nucleotide K-mer Matches as a Search Criterion

12 13 14 15 16 17 18 19 20 21 22
A. 81%0.9450.8800.8310.7210.6570.5260.4650.4080.3560.2550.218
83%0.9750.9360.9040.8200.7700.6490.5910.5350.4800.3610.318
85%0.9910.9710.9540.9000.8650.7670.7190.6690.6190.4900.445
87%0.9970.9900.9830.9540.9350.8670.8330.7960.7570.6340.591
89%1.0000.9970.9950.9840.9760.9390.9200.8970.8720.7750.741
91%1.0001.0000.9990.9960.9940.9790.9710.9620.9500.8900.869
93%1.0001.0001.0000.9990.9990.9960.9940.9910.9880.9630.954
95%1.0001.0001.0001.0001.0001.0000.9990.9990.9990.9940.992
97%1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
B. K 12 13 14 15 16 17 18 19 20 21 22
F27567168775171634284107026767174.21.00.3

[i] (A) Columns are for K sizes of 12–22. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated by equation 6 assuming a homologous region of 100 bases. (B) K represents the size of the near-perfect match. F shows how many perfect matches of this size expected to occur by chance according to equation 7 in a genome of 3 billion bases using a query of 500 bases.