A Chimpanzee Genome Project Is a Biomedical Imperative
A near-complete sequence of the human genome is now available, and many efforts are currently focused on the next logical biomedically relevant target — the mouse genome. Given limited resources, which vertebrate genome(s) should be tackled after that? Reasonable candidates include other well-studied model organisms such as Rattus rattus (the rat), Xenopus laevis (the African clawed toad), and Danio rerio (the zebra fish). Over the last few years, some have been advocating a genomic approach toward understanding our closest evolutionary relatives, the great apes (McConkey and Goodman 1997; Paabo 1999;McConkey et al., 2000). Pan troglodytes (the chimpanzee) andPan paniscus (the bonobo) share nearly 99% of human genomic sequences (see Box for discussion) (King and Wilson 1975; Sibley and Ahlquist 1984; Caccone and Powell 1989; Ruvolo 1997; Goodman et al. 1998; Satta et al. 2000). Thus, it is cogently argued that knowing the complete genome of at least one of these species will give us a window into genes that contribute to humaness (the chimpanzee is the first choice, because we know more about this species than we do about the bonobo). The emergence of humans can be regarded as one of the major transitions in evolution (Szathmary and Smith 1995), and the complete explanation of this phenomenon ranks as one of the greatest unsolved mysteries of science.
What Is the Meaningful Number for the Difference?
What does the oft quoted “1% difference” between humans and chimpanzees really mean? At the level of genomic sequence, the first numbers for identity between humans and chimpanzee genomic DNA obtained by hybridization melting curves of nonrepetitive DNA were ∼98.5% (Sibley and Ahlquist 1984,1987; Caccone and Powell 1989) (remarkably, the corresponding number for the difference between chimpanzees and bonobos is ∼99.2%, despite the fact that these very similar-appearing species were only recognized as distinct less than a century ago). These overall averaged numbers take into account both the noncoding and coding regions of genomic DNA. Because the latter represents only a minute fraction of the total, the numbers are strongly influenced by the noncoding regions. The melting-curve-derived number would be affected not only by single base pair differences, but also by specific deletions or insertions of genomic segments. Regardless, the overall number of ∼98.5% genomic difference has held up to more recent analyses of available sequences (Ruvolo 1997; Goodman et al. 1998). Individual human genomes seem to vary by about 1 bp/1000 (Ruvolo 1997; Goodman et al. 1998; Venter et al. 1998; Collins and Jegalian, 1999). Extrapolating from recent genomic comparisons that showed an approximately fourfold greater intraspecific variation among chimpanzees (Kaessmann et al. 1999) the corresponding number could be as high as 1 in 250 for chimpanzee genomes. Indeed, melting curve comparisons of individual Home and Pan genomes gave numbers that actually span a relatively broad range (mean of difference pf 1.65% − 0.26 SD, range 1.4–2.1) (Sibley and Ahlquist 1987), with an even wider range seen upon reanalysis of the data (Sibley and Ahlquist 1990). Thus, many base pair differences noted between individual human and chimpanzee genomes simply represent polymorphisms within either species (Hacia et al. 1999). Furthermore, of all of the genomic sequence differences, only about one-half occurred along the way to becoming human (Saitou 2000). By combining the above facts, we can conclude that differences responsible for uniquely human features are contained within <1% of the total genomes. However, most of these differences are random SNPs in noncoding regions that probably have no functional consequence, unless they happen to affect the function of important promoters (e.g., altering transcription factor binding sites). To determine an updated number for genomic regions coding for mRNAs, we compared 20 randomly selected chimpanzee cDNAs from GenBank with their human orthologs (excluding highly duplicated genes wherein it is difficult to be certain regarding true orthologs). In this analysis, we found that cDNA sequence identity was 99.31% ± 0.38 (mean ±s.d.). Also, predicted amino acid identity was 99.36% ± 0.66 (mean ± s.d.), very close to the original value determined by King and Wilson using the limited data sets available in the 1970s (King and Wilson 1975). Of the 20 protein sequences we examined, 7 were identical between the two species. Although a single change could determine a critical functional outcome, most of these amino acid differences are probably of no functional significance. In addition to amino acid changes resulting from single base pair differences, a few examples of wholesale insertions and deletions in regions of genomic sequence are known — mostly the result of chromosomal translocations, inversions, or transpositions, and regional deletions or duplications, each of which could potentially affect the expression pattern of certain genes (Gagneux and Varki 2000). What, therefore, is the meaningful number? Of all of the above, the most functionally meaningful number is likely to be that for amino acid sequence differences, as most consequences of gene expression are mediated by proteins. Of course, none of the above numerical data takes into account differences in the timing and level of expression of functional genes, as determined by other factors such as promoter action, chromatin organization, and silencing by DNA methylation. It also cannot be excluded that the many differences in number and location of repetitive DNA sequences may contribute to such expression differences. Furthermore, the functions of expressed proteins can be substantially modified by post-translational modifications such as glycosylation, phosphorylation, and acylation, as well as factors affecting the turnover and half-life of the proteins themselves. Overall, beyond their use to calculate the time since a common ancestor, the only practical significance of the numbers discussed here is to render the biological, physiological, and behavioral differences between humans and chimpanzees all the more remarkable.—Pascal Gagneux and Ajit Varki
Taxpaying citizens might argue that, given limited resources, this lofty and anthropocentric pursuit should not take precedence over the pragmatic value of sequencing genomes of model organisms that have already been better studied by a variety of biomedical and genetic approaches. Moreover, it might be suggested that this is a matter for the National Science Foundation (NSF) to deal with, not the National Institutes of Health (NIH). Programs within the NSF are currently considering a Human Origins Initiative (Weiss and Yellen 2000). However, I would like to suggest that there is clear and compelling biomedical value to giving high priority to the complete sequencing of the chimpanzee genome and that of at least one Old World monkey. The experience of primate centers and zoos over the last century indicates that there are many interesting differences in disease frequency and severity between humans and great apes such as the chimpanzee. Whereas the evidence is sometimes fragmentary or inconclusive, the nature and significance of these medical conditions (including AIDS, Alzheimer's disease (AD), cancer, malaria, and perimenopausal complications) are sufficient to draw attention to the issue. After all, extrapolating findings in physiology and pathology from mice, rats, toads, or fish to humans can be difficult, because of our significant physiological and genetic differences from these species. In contrast, the >99% identity of amino acid sequences of most chimpanzee and human proteins (see Box ) predict a stronger likelihood of finding genetic explanations for any disease differences. Studies of the chimpanzee genome could be considered a logical extension of the current emphasis on exploiting sequence differences between various human groups to identify important disease susceptibility genes. For this and other reasons, the cost of a chimpanzee genome project should also be much less than for the original Human Genome Project. Also, as discussed below, the knowledge gained could be of much value in our efforts to conserve and care for the great apes themselves.
Some pathological states in humans seem to represent the normal situation in chimpanzees, including craniosynostosis (closure of the skull sutures in the perinatal period) (Cohen 1991), general leukocytosis (a high white blood cell count) (Hodson et al. 1967;McClure et al. 1972) and extensive hypertrichosis (hairiness). Several other diseases or physiological states of humans appear to be rare or markedly attenuated in the chimpanzee (Scott 1992). Some of these diseases can be attributed to anatomic differences between the species, including protracted, painful, and dangerous childbirth (resulting from the larger head of the human fetus and the altered pelvis of the bipedal human female), neonatal cephalhematoma (the common subperiosteal blood clot of the new born human skull bones), wisdom tooth impaction (resulting from the reduced jaw size in humans and the lack of a post-molar gap), and various diseases attributed to gravity effects on bipedal humans (vertebral osteoarthritis, intervertebral disc protrusion, varicose veins, and hemorrhoids). There are also a few anatomically unique diseases of great apes that do not occur in humans, such as infection of the pharyngeal air sacs (an organ that is absent in humans) (Strobert and Swenson 1979). The rarity of certain other human conditions such as sexually transmitted diseases and severe hypercholesterolemia in great apes is possibly explained on a behavioral/cultural basis, as they can be induced experimentally in the latter (Scott 1992). The higher frequency in humans of anatomical disorders of the central nervous system such as hydrocephalus is also intriguing (Scott 1992), but could be explained on the basis of increased perinatal trauma. However, many other differences cannot be explained on any obvious behavioral, dietary, anatomic, cellular, or biochemical basis. It is these differences that justify the biomedical imperative of the title of this article.
The best known of such differences is the failure of HIV infection to progress to AIDS in the chimpanzee. Prior to the realization that HIV-1 was originally transferred from chimpanzees to humans (Gao et al. 1999), a large number of chimpanzees were experimentally infected with HIV derived from human patients (Alter et al. 1984). Many years went by before a single chimpanzee finally manifested progression to a true AIDS-like syndrome (Novembre et al. 1997). However, the HIV isolated from this individual is evidently an unusual mutant that evolved during the prolonged incubation period, as it rapidly produced an AIDS-like syndrome upon transfer to another chimpanzee. Despite many studies attempting to find the answer (for examples, see Arthur et al. 1989;Gendelman et al. 1991; Di Rienzo et al., 1994; Heeney et al. 1995;Ehret et al. 1996; Benton et al. 1998; Bogers et al. 1998; Pischinger et al. 1998), the mystery remains — this retrovirus seems to live in a symbiotic state within the chimpanzee immune system, whereas it almost routinely destroys the helper T cells of humans. Although the evolutionary reason for this is now reasonably clear (chimpanzees are probably a natural reservoir and humans are not), the mechanistic explanation remains obscure. The ability to compare the genomes of the two species could be highly instructive. In the best of all possible worlds, the knowledge gained could enable humans who are unfortunate enough to acquire HIV to live in a symbiosis with the infection, and the biomedical issue then becomes primarily one of preventing transmission to other individuals. Another virological mystery of less certain significance is the high frequency of endogenous foamy spumaviruses in great apes — in contrast, humans rarely get infected, and only upon exposure to great apes (Schweizer et al 1995; Goepfert et al. 1996).
AD is a common and devastating disease causing dementia in elderly humans whose brain pathology is characterized by the accumulation of amyloid plaques (consisting of fragments of the amyloid precursor protein) together with neurofibrillary tangles (paired helical filaments containing a hyperphosphosphorylated form of the neurofilament protein tau). Whereas the clinical diagnosis of AD might be difficult in a great ape, the complete pathological lesion including the neurofibrillary tangles has never been observed in the brains of elderly chimpanzees (Gearing et al. 1994). In contrast, age-matched samples from human brain specimens show a significant rate of these classic lesions, often well before the symptomatology has become evident (Braak and Braak 1997; Duyckaerts and Hauw 1997; Silverman et al. 1997; Price and Morris 1999). Neurofibrillary tangles can even exist in human brains independent of plaques, starting virtually at birth and reaching a 50% prevalence by age 48 (Duyckaerts and Hauw 1997; Silverman et al. 1997). This difference is all the more remarkable, given that chimpanzees express the ancestral apoE4 allele of apolipoprotein-E (Hanlon and Rubinsztein 1995; Hacia et al. 1999), which is associated with the highest risk of AD in humans. The fact that the full-blown lesion of AD has also not been observed in other long-lived animals (such as elderly elephants) (Cole and Neal 1990) reinforces the significance of this finding, and makes a comparison between human and the corresponding chimpanzee genes of great potential benefit.
Of all of the different forms of malaria, Plasmodium falciparum is the most aggressive and acutely life threatening; it is a major cause of mortality worldwide. Chimpanzees seem immune to infection with this parasite, and, instead, get infected by its close relative Plasmodium reichnowii, which apparently does not make them very ill (Escalante and Ayala 1994; Escalante et al. 1995; Qari et al. 1996). This resistance to falciparum malaria was clearly demonstrated when captive chimpanzees in Gabon did not get infected even in the face of a high rate of attack among their keepers, who were exposed to the same mosquito-containing environment (Ollomo et al. 1997). Even with some other forms of malaria, the parasite burden in experimentally infected chimpanzees only becomes substantial after a splenectomy (Morris et al. 1996; Sullivan et al. 1996). Although one cannot predict which factors are most important (e.g., do different mosquito strains prefer human vs. chimpanzee skin?), the bulk of the evidence predicts that genetic differences determine at least a portion of the observed differences in susceptibility. The knowledge gleaned from comparative studies of the relevant parasite genomes as well as the human and chimpanzee genomes could be quite informative.
Another surprising difference appears to be in the frequency of the most common human cancers, which are epithelial neoplasms such as carcinomas of the breast, ovary, lung, stomach, colon, pancreas, and prostate. Whereas these cancers cause >20% of deaths in modern human populations (Parker et al. 1997), an extensive literature suggests that the cancer incidence rates for the non-human primates is only ∼2%–4% and seems to be even lower in the great apes (McClure 1973; Seibold and Wolf 1973; Schmidt 1978; Beniashvili 1989; Scott 1992). Although the numbers of well-documented autopsies on great apes are relatively small (in the hundreds), several factors suggest that this apparent difference is not due to ascertainment bias. First, there are several reports of apes having leukemias and lymphomas (Manning and Griesemer 1974; Gardner et al. 1978), which comprise only a minority of malignancies in humans. Second, although age is certainly a factor affecting carcinoma incidence, great apes often live into their forties and fifties (and even sixties) in captivity. Furthermore, carcinomas certainly occur at a younger age in other animals, including monkeys (DePaoli and McClure 1982; Uno et al. 1998). Third, many asymptomatic benign tumors of various organs have been accurately identified and characterized during autopsies of great apes (McClure 1973; Seibold and Wolf 1973; Graham and McClure 1977; Beniashvili 1989; Scott 1992), indicating that the autopsies were well performed. Finally, the diet and environmental exposure of great apes living in captivity is certainly not free of the factors thought to be involved in carcinogenesis in humans. Further epidemiological studies (ideally a worldwide survey of all autopsy records of all major primate centers and zoos) should be done to confirm this tantalizing suggestion from the existing literature. Meanwhile, because cancer is clearly a disease of the genome (Hanahan and Weinberg 2000), comparative genomics should proceed with the objective of identifying which genes might be involved in this apparent difference. In this regard, it is of interest that a cell surface sugar modification that is lost in the human lineage due to a genomic mutation (Chou et al. 1998; Muchmore et al. 1998b) is reported to reappear in human cancers.
Another interesting difference appears to be in the incidence of the late complications of viral hepatitis B and hepatitis C. Whereas great apes can be infected with these human viruses, experimentally induced cases of chimpanzee hepatitis do not seem to progress as frequently to the complications often seen in humans, such as chronic active hepatitis, cirrhosis of the liver, and hepatocellular carcinomas (Muchmore et al., 1988a). Interestingly, as in the case of HIV, there is evidence suggesting that hepatitis B may actually have originated from chimpanzees (MacDonald et al. 2000).
Several aspects of female reproductive biology appear to be different between great apes and humans. Menopause is a natural state in human females that has not been observed in long-lived captive female chimpanzees (Graham 1979). Human females are also unusual in typically having obviously visible breasts in the absence of pregnancy or lactation, and in having a high frequency of breast diseases (fibrocystic disease and cancer, in particular). Also, the absence of external signs of ovulation in human females may result in fertilization taking place at suboptimal times with regard to the condition of the ovum. Thus, the question arises whether fertilization of deteriorating eggs may explain — at least partly — the high rate of early fetal wastage in humans that is typically associated with gross chromosomal and other genetic abnormalities. Regarding menstruation, anecdotal evidence suggests that the volume of blood lost per normal cycle might be significantly larger in humans, and that menometrorrhagia (excessive and frequent bleeding seen particularly in perimenopausal humans) is not common in great apes. These issues obviously have significant effects on the health and lifestyle of human females. Because the other general features of human and chimpanzee female reproductive biology (e.g., the overall ovarian cycle) are quite similar, comparative genomics could help unveil the basis for the unusual human features, each of which has some biomedical implications.
In addition to the above examples, anecdotal evidence suggests that some other common human conditions are rare in great apes in captivity (E. Strobert and B. Swenson, pers. comm.). Despite a high frequency of atopic rhinitis and polyps, bronchial asthma is rarely diagnosed in chimpanzees. Acne vulgaris, a common skin affliction of human teenagers also appears to be uncommon in the adolescent chimpanzee. Another common human disorder that apparently has not been detected in chimpanzees is rheumatoid arthritis. The external physical manifestations of each of these diseases are so obvious that they are very unlikely to have been missed by the experienced veterinarians involved in the long-term care of captive chimpanzees. Of course, one cannot rule out a generally lower sensitivity of caregivers for picking up mild versions of these illnesses in chimps.
Simply sequencing the genome of a chimpanzee and that of an Old World monkey does not provide a panacea. To emphasize the limitations of understanding derived from completing a genome sequence, it is hard to improve upon the statement of Alberts and Klug: “Determining the sequence of the genome is similar to completing the list of the chemical elements: it tells us about the basic components, but not about how they behave in combination. In other words, it gets us to the starting line for a massive increase in understanding, but does nothing by itself to provide us with that understanding.” (Alberts and Klug 2000). In this regard, we are still sadly lacking in a basic understanding of much of the biology, biochemistry, cell biology, and developmental biology of great apes that has been well studied in humans and in some other vertebrate model systems. Hence, to optimize the value of a chimpanzee genome project, there needs to be a parallel great ape phenome project (Varki et al. 1998) that would systematically obtain such basic information about the great apes. The current excess of chimpanzees in NIH-sponsored facilities provides an obvious opportunity for well-planned, ethically justified, and humane research that will benefit both humans and great apes. Regarding the hypothesis that the recent epidemic of breast and ovarian cancer is caused by evolutionary changes in the reproductive life-styles of westernized human females (Eaton et al. 1994), the current moratorium on chimpanzee breeding in NIH-funded facilities represents a comparative experiment that is already underway.
Assuming that the NIH (perhaps in a consortium with other interested federal agencies) will soon carry out a chimpanzee/primate genome project, how can we obtain and analyze the data most effectively? A recent cataloging of the known genetic differences between humans and great apes (Gagneux and Varki 2000) indicates that some of the differences might be quite obvious, that is, new junctions arising from chromosomal inversions and fusions, gene duplications, nonsense mutations, exon deletions, and repetitive element insertions. However, it is possible that some of the critical genetic differences will be single base pair changes that result in the altered action of a promoter or a functionally critical amino acid coding difference. Human genomes seem to vary from each other by about 1 bp/1000 (Ruvolo 1997;Goodman et al. 1998; Venter et al. 1998; Collins and Jegalian 1999), and the number is probably about 1 in 250 among chimpanzees (see Box). Moreover, the original comparisons of individual human and chimp genomes showed a range of difference from 1.4 to 2.1%. Thus, obtaining the complete sequence of a single chimpanzee genome will not be sufficient to provide all answers. However, such single nucleotide polymorphisms (SNPs) are currently being catalogued within human populations for other purposes (Collins and Jegalian 1999). This, together with the power of PCR- and gene array chip-based approaches (Hacia et al. 1998, 1999), should make it possible to quickly identify which SNPs are unique to the chimpanzee or the human. Given how close chimpanzee are to humans, one could perhaps piggyback the sequencing of the chimpanzee genome along with the future sequencing of multiple individual humans to identify human SNPs. Another possible approach would be to carry out the chimpanzee sequencing by use of a pool of chimpanzee genomes (presumably including representatives from the full range of known chimpanzee subspecies) (Gagneux et al. 1999) as the template for PCR-based sequencing. The complete knowledge of the human genome will make it easy to design the primers for either approach — and if a particular primer set does not work, this will point to an obvious difference between the two genomes. Other approaches might take advantage of large-scale gene chip-based microarrays (Hacia et al. 1998, 1999)
Of course, only one-half of the differences between the human and chimpanzee genomes occurred on the way to becoming human — the other half represents changes that occurred in the chimpanzee lineage (Saitou 2000). Thus, to narrow down the differences of interest, all differences that are found to be universal and unique to either the human or the chimpanzee should be eventually checked against the corresponding bonobo and gorilla sequences, to determine the likely ancestral state. To maximize the value of obtaining the chimpanzee genome, it would also be important to place the genome of at least one Old World monkey high on the priority list. This information will help to further narrow the range of differences that are of interest. Logical choices would include Macaca mulatta (the rhesus macaque) and Papiovhamadryas (the baboon), which have been subjects of much biomedical research over the years. The corollary benefits to monkeys and to the existing research programs involving them are obvious.
Last, but not least, the sequencing of the chimpanzee genome can also be considered a moral imperative. Primarily as a consequence of human activities, our closest evolutionary cousins are rapidly dwindling in numbers in the wild, to the point where complete extinction of these populations is a real danger. Meanwhile, the large number of great apes in captivity are being cared for in a less than ideal manner, because the medical approach taken largely assumes that their genes and biology are identical to ours. Better knowledge concerning the genomes and phenomes of these sentient species would be extremely valuable to enhance their care and would further highlight the urgent need for their conservation.
Acknowledgments
I thank Arno Motulsky, Pascal Gagneux, David Ginsburg, Kurt Benirschke, and Mark Weiss for helpful comments and criticisms, and Elizabeth Strobert, Brent Swenson, and Harold McClure for sharing their long-term experiences at the Yerkes Primate Center.
Footnotes
-
↵1 E-MAIL ; FAX (858) 534-5611.
- Cold Spring Harbor Laboratory Press











