Computational Inference of Homologous Gene Structures in the Human Genome

Table 2.

Gene Level Accuracy of GenomeScan as a Function of Protein Similarity in DraftGene and FinishGene Datasets

Variable Similarity category/dataset
10−5 > P > 10−20 10−40 > P > 10−80 10−120 > P > 10−180
Draft Finish Draft Finish Draft Finish
No. of genes in dataset 174 174 151 151 93 93
% of fragmented genes 42 0 43 0 55 0
No. of predicted genes 186 172 205 159 152 104
Genes completely covered (%) 38 58 48 71 57 73
Genes partially covered (%) 49 32 51 28 42 27
Genes missed (%) 13 10 1 1 1 0
No. of “extra” predicted genes 18 14 19 10 8 11
  • Sequences were grouped according to the level of similarity between the encoded protein and the available database proteins used in the predictions as described in the legend to Fig. 3. All known genes in the FinishGene set are complete (all coding exons present in a single sequence). Some genes in the DraftGene set represented by multiple “partial genes” in different draft contigs; these are listed as fragmented genes. Known genes were classified as completely covered if all exons were covered by GenomeScan predicted exons; partially covered, if some exons (but not all) were covered byGenomeScan predicted exons; and missed, if no exon was covered by a GenomeScan-predicted exon. GenomeScan predicted genes which did not overlap any known gene are listed as “extra” predicted genes.

  • Includes predicted partial genes as well as complete genes.

This Article

  1. Genome Res. 11: 803-816

Preprint Server