Table 1.

Selected Examples of Prediction Accuracy in Different Areas of Sequence Analysis

Prediction of Acc × cov[ii] Accuracy (%) Coverage or coverage in % of reference set Reference[iii]
Human promoters0.355070% of annotated test set Prestidge 1995; P. Bucher (pers. comm)
Human regulatory RNA elements0.348540% of new DNA Dandekar and Sharma (1998)
Human genes (only presence)0.497070% of chromosome 22 Dunham et al. (1999) and refs. therein
Human SNPs by EST comparison0.217030% of all proteins with SNP Buelow et al. (1999); Sunyaev et al. (2000)
Human alternative splicing0.459050% of all splice sites Hanke et al. (1999)
Transmembranes (only presence)0.858599% of annotated test set Tusnady and Simon (1998) and refs. therein
Signal peptides (only presence).9090100% of annotated test set Nielsen et al. (1999)
GPI ancors (incl cleavage site).7272100% of annotated test set Eisenhaber et al. (1999)
Coiled coil (only presence).819090% of annotated coiled coil Lupas (1996)
Secondary structure (Three states).7777100% of 3D test set Jones (1999) and refs. therein
Buried or exposed residues.7474100% of 3D test set Rost (1996)
Residue hydration.7272100% of 3D test set Ehrlich et al. (1998)
Protein folds (in Mycoplasma).499850% of Mycoplasma ORFs Teichmann et al. (1999) and refs. therein
Homology (several methods).499850% of 3D test set Muller et al. (1999) and refs. therein
Functional features by homology.639070% unicellular genomes Bork and Koonin (1998); Brenner (1999)
Function association by context.255010% high confidence in yeast Marcotte et al. (1999b)
Cellular localization (two states).7777100% of annotated test set Andrade et al. (1998)

[i] The numbers referred to are in many cases crude estimates taken or sometimes even estimated from the literature and have an expected accuracy of ∼70%. Direct comparison of the numbers might be misleading as the context is not properly explained here. Furthermore, although most of the examples are two state predictions, the percentage numbers do not take into account random occurrences of the states. All test sets are most likely biased (e.g., current 31) test sets do not contain many compositionally biased regions, which probably contain up 15% of all residues, and annotation test sets are far from being perfect; see text), i.e., the real accuracy is thus probably lower.

[ii] To make the numbers more comparable, accuracy has been multiplied by coverage; some methods give accuracy for different degree of coverage and roughly justify this procedure. However, often it is biased toward sensitivity as specificity cannot be properly taken into account. Most features predicted with an accuracy × coverage >0.70 are of structural nature and at best only indirectly imply a certain functionality.

[iii] Only one recent reference is given and if indicated, references therein should also be considered as other reports do not always agree with the numbers given.