Evaluation of Gene-Finding Programs on Mammalian Sequences

Table 3.

Accuracy versus G + C Content

C + G content <40%(14) 40–50%(69) 50–60%(93) >60%(19)
AC (Esn+Esp)/2 AC (Esn+Esp)/2 AC (Esn+Esp)/2 AC (Esn+Esp)/2
FGENES 0.84 0.70 0.81 0.64 0.85 0.71 0.87 0.66
GeneMark.hmm 0.79 0.48 0.80 0.46 0.87 0.62 0.85 0.48
Genie 0.85 0.69 0.85 0.60 0.92 0.75 0.87 0.79
Genscan 0.94 0.80 0.91 0.66 0.91 0.74 0.88 0.70
HMMgene 0.91 0.76 0.90 0.73 0.92 0.79 0.91 0.77
Morgan 0.65 0.29 0.72 0.49 0.69 0.43 0.69 0.37
MZEF 0.66 0.71 0.65 0.50 0.70 0.62 0.58 0.53
  • The HMR195 dataset was partitioned according to the G + C% content of the sequences. The number in parenthesis in the header of each column represents the number of sequences belonging to each partition. For each program, AC and (ESn+ESp)/2 are averaged over all sequences belonging to the particular partition for which they are defined.

This Article

  1. Genome Res. 11: 817-832

Preprint Server