Table 3.

Sensitivity and Specificity of Single Perfect Nucleotide K-mer Matches as a Search Criterion

7 8 9 10 11 12 13 14
A. 81%0.9740.9150.8330.7260.6070.4860.3730.314
83%0.9880.9530.8970.8150.7110.5950.4780.415
85%0.9960.9780.9450.8880.8080.7070.5940.532
87%0.9990.9920.9750.9420.8880.8110.7140.659
89%1.0000.9980.9910.9760.9460.8970.8240.782
91%1.0001.0000.9980.9930.9810.9560.9120.886
93%1.0001.0001.0000.9990.9950.9870.9680.957
95%1.0001.0001.0001.0000.9990.9980.9940.991
97%1.0001.0001.0001.0001.0001.0001.0000.999
B. K 7 8 9 10 11 12 13 14
F1.3e+072.9e+066357831430513251274511719399

[i] (A) Columns are for K sizes of 7–14. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated from equation 3 assuming a homologous region of 100 bases. The larger the value of K, the fewer homologies are detected.

[ii] (B) K represents the size of the perfect match. F shows how many perfect matches of this size expected to occur by chance according to equation 4 in a genome of 3 billion bases using a query of 500 bases.