Evaluation of Gene-Finding Programs on Mammalian Sequences

Table 4.

Accuracy versus Exon Length

Programs Length range of exons in bp
0–24 (22) 25–49 (49) 50–74 (91) 75–99 (130) 100–199 (440) 200–299 (91) 300+ (125)
FGENES 0.45 0.55 0.71 0.80 0.80 0.71 0.59
(0.33) (0.42) (0.64) (0.75) (0.81) (0.61) (0.66)
GeneMark.hmm 0.05 0.39 0.60 0.77 0.75 0.67 0.46
(0.12) (0.51) (0.58) (0.72) (0.73) (0.62) (0.45)
Genie 0.27 0.53 0.60 0.80 0.70 0.71 0.69
(0.18) (0.47) (0.66) (0.81) (0.83) (0.68) (0.69)
Genscan 0.18 0.45 0.68 0.89 0.84 0.87 0.66
(0.29) (0.81) (0.79) (0.85) (0.76) (0.71) (0.65)
HMMgene 0.23 0.59 0.64 0.79 0.80 0.78 0.77
(0.42) (0.76) (0.75) (0.77) (0.85) (0.72) (0.74)
Morgan 0.30 0.37 0.38 0.61 0.51 0.51 0.42
(0.14) (0.14) (0.31) (0.57) (0.57) (0.41) (0.35)
MZEF 0.00 0.16 0.32 0.40 0.49 0.45 0.12
(0.00) (0.44) (0.45) (0.58) (0.73) (0.53) (0.26)
  • The HMR195 dataset was partitioned according to the length of the annotated exons. The number in parenthesis in the header of each column represents the number of actual exons belonging to each partition. For each program, CRa (the proportion of real exons that are correctly predicted [the upper number]) and CRp (the proportion of predicted exons that are correct [the number in parentheses]) are averaged over all sequences belonging to that particular partition.

This Article

  1. Genome Res. 11: 817-832

Preprint Server