Computational Inference of Homologous Gene Structures in the Human Genome

Table 4.

Summary of GenomeScan-predicted Genes and Partial Genes in the Human Genome

Similarity category Type of predicted gene
Complete genes (>2 exons) Partial genes All genes (partial + complete)
No. of genes No. of exons/gene No. of aa/gene No. of genes No. of exons/gene No. of genes % of all predicted genes
Known (cDNA) 5698 9.6 496 8901 4.9 16040 41.5
Protein + EST 4502 8.8 510 6537 5.5 12546 32.5
Proteins only 2767 5.2 303 4600 3.1 10061 26.0
All 12967 8.4 460 20038 4.7 38647 100.0
  • Genes were predicted in the September 2000 GoldenPath human genome sequence as described in Methods. Predicted coding sequences (CDS) were first compared to cDNAs in the RefSeq cDNA database (September 2000) using BLASTN; those which had a hit at least 100 bp long with at least 98% identity are listed as “known”. The remaining predicted coding sequences were searched against dbEST (September 2000 release) using BLASTN; those which had a hit at least 100 bp long with at least 97% identity are listed as “Protein + EST”. All other predicted genes are categorized as “Protein only” because all GenomeScan-predicted genes have at least modest similarity to a known protein. Statistics are listed separately for predicted partial genes and predicted complete genes with at least three exons; the category “all genes” includes these two groups as well as predicted 1- and 2-exon genes.

This Article

  1. Genome Res. 11: 803-816

Preprint Server