GeneID in Drosophila

Table 1.

Testing Different Models of Coding DNA in the Training Semiartificial Genomic Sequence

Base level Exon level
Sn Sp CC Sne Spe SnSp ME WE
Sites–PWM 0.23 0.65 0.37 0.17 0.13 0.15 0.72 0.79
CU 0.91 0.88 0.88 0.46 0.43 0.45 0.21 0.27
DIA + CP 0.91 0.88 0.89 0.46 0.46 0.46 0.23 0.25
MM-5 0.93 0.90 0.91 0.54 0.51 0.52 0.18 0.24
PWM and MM-5 0.92 0.92 0.92 0.75 0.71 0.73 0.12 0.18
  • (CU) Codon usage model; (DIA+CP) combination of a Markov model of order 1 of the translated amino acid sequence and a Codon preference model; (MM-5) Markov model of order 5. Genes have been predicted usingGeneID, but in each case exons have been scored on the basis solely of the coding DNA model, ignoring the contribution of the exon-defining sites. Predicted genes have been compared with the annotated ones, and the usual measures of accuracy computed. Results obtained when exons are scored as a function only of the scores of the defining sites are also given (Sites–PWM). Finally, we report the results on accuracy when the exons are scored as the sum of the Markov model score and the scores of the exon-defining sites. This is the scoring schema used by GeneID when attempting to predict genes in the Adh region.

This Article

  1. Genome Res. 10: 511-515

Preprint Server