EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches

Table 1.

BLASTX Performance on HMR195 and Burset/Guigo Data Set on Adding Similarity Information




Nucleotide level

Exon level
Program
No. genes
SEN
SPE
AC
CC
CR
PC
OL
ME
WE
ESEN
ESPE
EAVG
HMR195 data set
BLASTX (1st cycle) 1 0.88 0.64 0.69 0.66 36 277 643 98 1920 0.04 0.02 0.03
BLASTX (2nd cycle) 0 0.91 0.72 0.76 0.75 75 317 453 117 962 0.18 0.12 0.15
BLASTX+NNSPLICE 0 0.91 0.73 0.77 0.76 569 226 52 116 961 0.59 0.40 0.49
BLASTX+INTRON 0 0.90 0.84 0.84 0.84 75 316 446 124 380 0.18 0.13 0.16
BLASTX+INTRON+NNSPLICE 0 0.91 0.89 0.87 0.87 565 221 34 132 192 0.58 0.56 0.57
Burset/Guigo data set
BLASTX (1st cycle) 0 0.92 0.61 0.67 0.66 123 746 1737 246 5697 0.04 0.02 0.03
BLASTX (2nd cycle) 0 0.90 0.75 0.77 0.76 168 1050 1070 380 1867 0.06 0.04 0.05
BLASTX+NNSPLICE 0 0.91 0.77 0.79 0.78 1677 515 106 376 1858 0.62 0.48 0.55
BLASTX+INTRON 4 0.89 0.86 0.85 0.84 168 1041 1030 421 584 0.06 0.06 0.06
BLASTX+INTRON+NNSPLICE
7
0.89
0.93
0.89
0.88
1698
433
46
472
258
0.64
0.67
0.66
  • Only the forward (+) strand exons from default output of programs tested were compared to GenBank annotated exons for each sequence. The standard measures of predictive accuracy were averaged over all sequences in the data set: SEN, nucleotide level sensitivity; SPE, nucleotide level specificity; AC, approximate correlation; CC, correlation coefficient; ESEN, exon level sensitivity; ESPE, exon level specificity; EAVG, (ESEN + ESPE)/2; ME, number of missed real exons; WE, number of predicted wrong exons; CR, number of correctly predicted exons that are correct at both ends; PC, number of predicted exons that are partially correct; OL, number of predicted exons overlapping actual exons; No. genes, number of genes where no predictions were made by the programs.

This Article

  1. Genome Res. 14: 1756-1766

Preprint Server