What Is the Meaningful Number for the Difference?
What does the oft quoted “1% difference” between humans and chimpanzees really mean? At the level of genomic sequence, the first numbers for identity between humans and chimpanzee genomic DNA obtained by hybridization melting curves of nonrepetitive DNA were ∼98.5% (Sibley and Ahlquist 1984,1987; Caccone and Powell 1989) (remarkably, the corresponding number for the difference between chimpanzees and bonobos is ∼99.2%, despite the fact that these very similar-appearing species were only recognized as distinct less than a century ago). These overall averaged numbers take into account both the noncoding and coding regions of genomic DNA. Because the latter represents only a minute fraction of the total, the numbers are strongly influenced by the noncoding regions. The melting-curve-derived number would be affected not only by single base pair differences, but also by specific deletions or insertions of genomic segments. Regardless, the overall number of ∼98.5% genomic difference has held up to more recent analyses of available sequences (Ruvolo 1997; Goodman et al. 1998). Individual human genomes seem to vary by about 1 bp/1000 (Ruvolo 1997; Goodman et al. 1998; Venter et al. 1998; Collins and Jegalian, 1999). Extrapolating from recent genomic comparisons that showed an approximately fourfold greater intraspecific variation among chimpanzees (Kaessmann et al. 1999) the corresponding number could be as high as 1 in 250 for chimpanzee genomes. Indeed, melting curve comparisons of individual Home and Pan genomes gave numbers that actually span a relatively broad range (mean of difference pf 1.65% − 0.26 SD, range 1.4–2.1) (Sibley and Ahlquist 1987), with an even wider range seen upon reanalysis of the data (Sibley and Ahlquist 1990). Thus, many base pair differences noted between individual human and chimpanzee genomes simply represent polymorphisms within either species (Hacia et al. 1999). Furthermore, of all of the genomic sequence differences, only about one-half occurred along the way to becoming human (Saitou 2000). By combining the above facts, we can conclude that differences responsible for uniquely human features are contained within <1% of the total genomes. However, most of these differences are random SNPs in noncoding regions that probably have no functional consequence, unless they happen to affect the function of important promoters (e.g., altering transcription factor binding sites). To determine an updated number for genomic regions coding for mRNAs, we compared 20 randomly selected chimpanzee cDNAs from GenBank with their human orthologs (excluding highly duplicated genes wherein it is difficult to be certain regarding true orthologs). In this analysis, we found that cDNA sequence identity was 99.31% ± 0.38 (mean ±s.d.). Also, predicted amino acid identity was 99.36% ± 0.66 (mean ± s.d.), very close to the original value determined by King and Wilson using the limited data sets available in the 1970s (King and Wilson 1975). Of the 20 protein sequences we examined, 7 were identical between the two species. Although a single change could determine a critical functional outcome, most of these amino acid differences are probably of no functional significance. In addition to amino acid changes resulting from single base pair differences, a few examples of wholesale insertions and deletions in regions of genomic sequence are known — mostly the result of chromosomal translocations, inversions, or transpositions, and regional deletions or duplications, each of which could potentially affect the expression pattern of certain genes (Gagneux and Varki 2000). What, therefore, is the meaningful number? Of all of the above, the most functionally meaningful number is likely to be that for amino acid sequence differences, as most consequences of gene expression are mediated by proteins. Of course, none of the above numerical data takes into account differences in the timing and level of expression of functional genes, as determined by other factors such as promoter action, chromatin organization, and silencing by DNA methylation. It also cannot be excluded that the many differences in number and location of repetitive DNA sequences may contribute to such expression differences. Furthermore, the functions of expressed proteins can be substantially modified by post-translational modifications such as glycosylation, phosphorylation, and acylation, as well as factors affecting the turnover and half-life of the proteins themselves. Overall, beyond their use to calculate the time since a common ancestor, the only practical significance of the numbers discussed here is to render the biological, physiological, and behavioral differences between humans and chimpanzees all the more remarkable.—Pascal Gagneux and Ajit Varki











