BLAT—The BLAST-Like Alignment Tool

Table 4.

Sensitivity and Specificity of Single Perfect Amino Acid K-mer Matches as a Search Criterion

K 3 4 5 6 7
A. 71% 0.992 0.904 0.697 0.496 0.317
73% 0.996 0.931 0.752 0.560 0.374
75% 0.998 0.952 0.803 0.625 0.436
77% 0.999 0.969 0.850 0.689 0.503
79% 0.999 0.981 0.890 0.752 0.574
81% 1.000 0.989 0.924 0.810 0.646
83% 1.000 0.994 0.950 0.862 0.718
85% 1.000 0.997 0.970 0.906 0.787
87% 1.000 0.999 0.984 0.942 0.850
89% 1.000 1.000 0.993 0.968 0.903
91% 1.000 1.000 0.997 0.985 0.945
93% 1.000 1.000 0.999 0.995 0.975
B. K 3 4 5 6 7
F 4.2e+07 1.6e+06 62625 2609 112
  • (A) Columns are for K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated from equation 3 assuming a homologous region of 33 amino acids. (B) K represents the size of the perfect match. F shows how many perfect matches of this size are expected to occur by chance according to equation 4 in a translated genome of 3 billion bases using a query of 167 amino acids (corresponding to 500 bases).

This Article

  1. Genome Res. 12: 656-664

Preprint Server