BLAT—The BLAST-Like Alignment Tool

Table 8.

Sensitivity and Specificity of Multiple (2 and 3) Perfect Amino Acid K-mer Matches as a Search Criterion

2,3 2,4 2,5 2,6 2,7 3,3 3,4 3,5 3,6 3,7
A. 71% 0.945 0.643 0.297 0.126 0.044 0.945 0.643 0.297 0.126 0.044
73% 0.965 0.712 0.363 0.167 0.063 0.965 0.712 0.363 0.167 0.063
75% 0.978 0.776 0.436 0.218 0.089 0.978 0.776 0.436 0.218 0.089
77% 0.987 0.833 0.514 0.280 0.123 0.987 0.833 0.514 0.280 0.123
79% 0.993 0.882 0.596 0.353 0.169 0.993 0.882 0.596 0.353 0.169
81% 0.997 0.922 0.678 0.435 0.226 0.997 0.922 0.678 0.435 0.226
83% 0.999 0.952 0.757 0.526 0.298 0.999 0.952 0.757 0.526 0.298
85% 0.999 0.973 0.829 0.622 0.385 0.999 0.973 0.829 0.622 0.385
87% 1.000 0.987 0.889 0.719 0.485 1.000 0.987 0.889 0.719 0.485
89% 1.000 0.995 0.936 0.809 0.596 1.000 0.995 0.936 0.809 0.596
91% 1.000 0.998 0.969 0.886 0.712 1.000 0.998 0.969 0.886 0.712
93% 1.000 1.000 0.988 0.944 0.823 1.000 1.000 0.988 0.944 0.823
B. N,K 2,3 2,4 2,5 2,6 2,7 3,3 3,4 3,5 3,6 3,7
F 171875 245 0.4 0.0 0.0 708 0.0 0.0 0.0 0.0
  • (A) Columns are for N sizes of 2 and 3 and K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected. (B) N and K represents the number and size of the perfect matches, respectively. F shows how many perfect clustered matches expected to occur by chance in a translated genome of 3 billion bases using a query of 167 amino acids.

This Article

  1. Genome Res. 12: 656-664

Preprint Server