Markup | Genome Research

Table 4.

Sensitivity and Specificity of Single Perfect Amino Acid K-mer Matches as a Search Criterion

K	3	4	5	6	7
A. 71%	0.992	0.904	0.697	0.496	0.317
73%	0.996	0.931	0.752	0.560	0.374
75%	0.998	0.952	0.803	0.625	0.436
77%	0.999	0.969	0.850	0.689	0.503
79%	0.999	0.981	0.890	0.752	0.574
81%	1.000	0.989	0.924	0.810	0.646
83%	1.000	0.994	0.950	0.862	0.718
85%	1.000	0.997	0.970	0.906	0.787
87%	1.000	0.999	0.984	0.942	0.850
89%	1.000	1.000	0.993	0.968	0.903
91%	1.000	1.000	0.997	0.985	0.945
93%	1.000	1.000	0.999	0.995	0.975
B. K	3	4	5	6	7
F	4.2e+07	1.6e+06	62625	2609	112

[i] (A) Columns are for K sizes of 3–7. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated from equation 3 assuming a homologous region of 33 amino acids. (B) K represents the size of the perfect match. F shows how many perfect matches of this size are expected to occur by chance according to equation 4 in a translated genome of 3 billion bases using a query of 167 amino acids (corresponding to 500 bases).