BLASTX Performance on HMR195 and Burset/Guigo Data Set on Adding Similarity Information
|
|
|
Nucleotide level |
Exon level |
||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Program |
No. genes |
SEN |
SPE |
AC |
CC |
CR |
PC |
OL |
ME |
WE |
ESEN |
ESPE |
EAVG |
||||||||||
| HMR195 data set | |||||||||||||||||||||||
| BLASTX (1st cycle) | 1 | 0.88 | 0.64 | 0.69 | 0.66 | 36 | 277 | 643 | 98 | 1920 | 0.04 | 0.02 | 0.03 | ||||||||||
| BLASTX (2nd cycle) | 0 | 0.91 | 0.72 | 0.76 | 0.75 | 75 | 317 | 453 | 117 | 962 | 0.18 | 0.12 | 0.15 | ||||||||||
| BLASTX+NNSPLICE | 0 | 0.91 | 0.73 | 0.77 | 0.76 | 569 | 226 | 52 | 116 | 961 | 0.59 | 0.40 | 0.49 | ||||||||||
| BLASTX+INTRON | 0 | 0.90 | 0.84 | 0.84 | 0.84 | 75 | 316 | 446 | 124 | 380 | 0.18 | 0.13 | 0.16 | ||||||||||
| BLASTX+INTRON+NNSPLICE | 0 | 0.91 | 0.89 | 0.87 | 0.87 | 565 | 221 | 34 | 132 | 192 | 0.58 | 0.56 | 0.57 | ||||||||||
| Burset/Guigo data set | |||||||||||||||||||||||
| BLASTX (1st cycle) | 0 | 0.92 | 0.61 | 0.67 | 0.66 | 123 | 746 | 1737 | 246 | 5697 | 0.04 | 0.02 | 0.03 | ||||||||||
| BLASTX (2nd cycle) | 0 | 0.90 | 0.75 | 0.77 | 0.76 | 168 | 1050 | 1070 | 380 | 1867 | 0.06 | 0.04 | 0.05 | ||||||||||
| BLASTX+NNSPLICE | 0 | 0.91 | 0.77 | 0.79 | 0.78 | 1677 | 515 | 106 | 376 | 1858 | 0.62 | 0.48 | 0.55 | ||||||||||
| BLASTX+INTRON | 4 | 0.89 | 0.86 | 0.85 | 0.84 | 168 | 1041 | 1030 | 421 | 584 | 0.06 | 0.06 | 0.06 | ||||||||||
| BLASTX+INTRON+NNSPLICE
|
7
|
0.89
|
0.93
|
0.89
|
0.88
|
1698
|
433
|
46
|
472
|
258
|
0.64
|
0.67
|
0.66
|
||||||||||
-
Only the forward (+) strand exons from default output of programs tested were compared to GenBank annotated exons for each sequence. The standard measures of predictive accuracy were averaged over all sequences in the data set: SEN, nucleotide level sensitivity; SPE, nucleotide level specificity; AC, approximate correlation; CC, correlation coefficient; ESEN, exon level sensitivity; ESPE, exon level specificity; EAVG, (ESEN + ESPE)/2; ME, number of missed real exons; WE, number of predicted wrong exons; CR, number of correctly predicted exons that are correct at both ends; PC, number of predicted exons that are partially correct; OL, number of predicted exons overlapping actual exons; No. genes, number of genes where no predictions were made by the programs.











