Summary of GenomeScan-predicted Genes and Partial Genes in the Human Genome
| Similarity category | Type of predicted gene | ||||||
| Complete genes (>2 exons) | Partial genes | All genes (partial + complete) | |||||
| No. of genes | No. of exons/gene | No. of aa/gene | No. of genes | No. of exons/gene | No. of genes | % of all predicted genes | |
| Known (cDNA) | 5698 | 9.6 | 496 | 8901 | 4.9 | 16040 | 41.5 |
| Protein + EST | 4502 | 8.8 | 510 | 6537 | 5.5 | 12546 | 32.5 |
| Proteins only | 2767 | 5.2 | 303 | 4600 | 3.1 | 10061 | 26.0 |
| All | 12967 | 8.4 | 460 | 20038 | 4.7 | 38647 | 100.0 |
-
Genes were predicted in the September 2000 GoldenPath human genome sequence as described in Methods. Predicted coding sequences (CDS) were first compared to cDNAs in the RefSeq cDNA database (September 2000) using BLASTN; those which had a hit at least 100 bp long with at least 98% identity are listed as “known”. The remaining predicted coding sequences were searched against dbEST (September 2000 release) using BLASTN; those which had a hit at least 100 bp long with at least 97% identity are listed as “Protein + EST”. All other predicted genes are categorized as “Protein only” because all GenomeScan-predicted genes have at least modest similarity to a known protein. Statistics are listed separately for predicted partial genes and predicted complete genes with at least three exons; the category “all genes” includes these two groups as well as predicted 1- and 2-exon genes.











