Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22

Table 1.

Numbers of Pseudogenes and Genes

Type of (pseudo)genes Predictions for chromosomes Extrapolation or prediction
21 22 21 + 22 Whole human genome
GenomeScan genes 279  648 (593) 927 (872) 38,647  (∼26,000–28,000)  (∼20,000–25,000)
Processed pseudogenes 77 [65] 112 [83] 189 [148] ∼6100–6600 (based on  chromosome 21 data)  ∼8700–9400 (based on  chromosome 22 data)
Nonprocessed pseudogenes 72 [64] 123 [83] 195 [147] ∼5700–6200 (based on  chromosome 21 data)  ∼9600–10,400 (based on  chromosome 22 data)
Pseudogenic Ig segments 70  70 
GenomeScan genes, removing those that  overlap with our pseudogenes (above) 226–257 437–551 663–808
  • Supplied by C. Burge and R.-F. Yeh (personal communication). See (Yeh et al., 2001) for details of the program GenomeScan.

  • The figures in brackets give the totals omitting the Sanger (chromosome 22) and Riken (chromosome 21) pseudogene annotations.

  • These are the ranges estimated for total number of genes byYeh et al. (2001), first, if overparsing of gene structures is taken into account (i.e., splitting up genes into smaller genes) and, second, if the expected rates of false-positives and pseudogenes are taken into account [see Yeh et al. (2001) for details].

  • The Riken and Sanger Centre annotations are merged with our own annotations. The pseudogenic immunoglobulin gene segments are taken out of these totals.

  • Based on data for either chromosomes 21 and 22, omitting immunoglobulin gene segments for the nonprocessed pseudogene estimate.

  • This does not include immunoglobulin gene segments on chromosome 22. The lower bound arises from discarding predicted genes that have any predicted pseudogenic exon. The upper bound is for predicted genes that are judged disabled in each exon, or only comprise an isolated disabled fragment (such as a processed pseudogene or an isolated disabled exon).

This Article

  1. Genome Res. 12: 272-280

Preprint Server