Comparing the human and chimpanzee genomes: Searching for needles in a haystack

  1. Ajit Varki1 and
  2. Tasha K. Altheide
  1. Glycobiology Research and Training Center, Departments of Medicine and Cellular & Molecular Medicine, University of California at San Diego, La Jolla, California 92093, USA

Abstract

The chimpanzee genome sequence is a long-awaited milestone, providing opportunities to explore primate evolution and genetic contributions to human physiology and disease. Humans and chimpanzees shared a common ancestor ∼5-7 million years ago (Mya). The difference between the two genomes is actually not ∼1%, but ∼4%—comprising ∼35 million single nucleotide differences and ∼90 Mb of insertions and deletions. The challenge is to identify the many evolutionarily, physiologically, and biomedically important differences scattered throughout these genomes while integrating these data with emerging knowledge about the corresponding “phenomes” and the relevant environmental influences. It is logical to tackle the genetic aspects via both genome-wide analyses and candidate gene studies. Genome-wide surveys could eliminate the majority of genomic sequence differences from consideration, while simultaneously identifying potential targets of opportunity. Meanwhile, candidate gene approaches can be based on such genomic surveys, on genes that may contribute to known differences in phenotypes or disease incidence/severity, or on mutations in the human population that impact unique aspects of the human condition. These two approaches will intersect at many levels and should be considered complementary. We also cite some known genetic differences between humans and great apes, realizing that these likely represent only the tip of the iceberg.

Humans (Homo sapiens) and chimpanzees (Pan troglodytes) last shared a common ancestor ∼5-7 million years ago (Mya) (Chen and Li 2001; Brunet et al. 2002). What makes humans different from their closest evolutionary relatives, and how, why, and when did these changes occur? These are fascinating questions, and a major challenge is to explain how genomic differences contributed to this process (Goodman 1999; Gagneux and Varki 2001; Klein and Takahata 2002; Carroll 2003; Olson and Varki 2003; Enard and Pääbo 2004; Gagneux 2004; Ruvolo 2004; Goodman et al. 2005; Li and Saunders 2005; McConkey and Varki 2005). Most genome projects focus on elucidating the sequence and structure of a species' genome and then identifying conserved functionally important genes and genomic elements. The finished human genome (International Human Genome Sequencing Consortium 2004) provides such a catalog of genomic features that ultimately interact with the environment to determine our biology, physiology, and disease susceptibility. Completion of the draft chimpanzee genome sequence (The Chimpanzee Sequencing and Analysis Consortium 2005) provides a genome-wide comparative catalog that can be used to identify genes or genomic regions underlying the many features that distinguish humans and chimpanzees.

As humans, we have an inherent interest in understanding and improving the human condition. We also believe that we have many characteristics that are uniquely human. Table 1 lists some of the definite and possible phenotypic traits that appear to differentiate us from chimpanzees and other “great apes”2. For the most part, we do not know which genetic features interact with the environment to generate these differences between the “phenomes”3 of our two species. The chimpanzee has also long been seen as a model for human diseases because of its close evolutionary relationship. This is indeed the case for a few disorders. Nevertheless, it is a striking paradox that chimpanzees are in fact not good models for many major human diseases/conditions (see Table 2) (Varki 2000; Olson and Varki 2003). In retrospect, this should not be too surprising. After all, at least some major diseases of a species are likely related to (mal)adaptations during the recent evolutionary past of that species (Nesse and Williams 1995). Thus, comparisons with the chimpanzee genome could shed important light on the uniquely human pathogenic mechanisms of serious diseases. This, in turn, could point to novel approaches toward prevention or treatment. Opportunities to address broader questions in evolutionary biology also arise, i.e., to examine the evolutionary forces that underlie recent speciation and phenotypic divergence between two closely related mammalian taxa, as well as the mechanisms by which evolutionary novelties are generated. It is this intertwining of anthropogeny (the study of human origins), biomedical interests, and general evolutionary principles that make the chimpanzee genome such an invaluable resource.

Table 1.

Some phenotypic traits of humans for comparison with those of great apesa

Table 2.

Differences between humans and apes in incidence or severity of medical conditionsa

Here, we briefly mention some of the initial findings from sequencing the chimpanzee genome, and describe how these data might be used to address some of the above questions. This pursuit requires the involvement and cooperation of scientists from a wide variety of fields, far beyond the scope of genomics. Thus, our comments are focused not so much on genomics per se, but are rather addressed to the broader scientific community of “genome users.”

Sequencing of the chimpanzee genome

Less than a decade ago, sequencing the chimpanzee genome was not even on the “radar screen” of the major sequencing centers. Repeated public statements of interest from many other sectors of the scientific community (McConkey and Goodman 1997; McConkey and Varki 2000; Varki 2000) and increasing interest within the genome community eventually led to the writing of “white papers”,4 and the assignment of high priority to this effort (http://www.genome.gov/10002851). The recent analysis of the draft chimpanzee genome sequence (The Chimpanzee Sequencing and Analysis Consortium 2005), and the many “companion” papers (Cheng et al. 2005; Hughes et al. 2005; Linardopoulou et al. 2005) now provide researchers with a wealth of comparative genetic information.

The published sequence was from a single captive-born male of the Pan troglodytes verus subspecies. Sequence data obtained via a whole-genome shotgun approach to a BAC library generated an ∼3.6× coverage, i.e, ∼3.6-fold redundancy in sequencing reads from the autosomes (sex chromosomes have half that redundancy in a male). The assembled sequence covers ∼94% of the genome, with 98% of the sequence having an estimated error rate of ≤10-4. Several additional chimpanzees (including other subspecies) were sequenced at lower coverage. In addition to identifying polymorphisms within chimpanzees, these data confirmed the high quality and completeness of the human genome sequence and established ancestral states of human single-nucleotide polymorphisms (SNPs).

Defining the important differences: Searching for needles in a haystack

With the two genome sequences in hand, one can begin a systematic identification of genes, regulatory elements, and other functionally relevant genomic regions that differentiate humans and chimpanzees. Of course, what we are really exploring is a complex interplay between multiple genetic differences, interacting with diverse physiological, environmental, and cultural factors, eventually resulting in the observed phenotypic differences. Single-nucleotide divergence was estimated at ∼1.23%, with ∼1% corresponding to fixed species divergence and the remainder representing species-specific polymorphisms (The Chimpanzee Sequencing and Analysis Consortium 2005). While insertion-deletion (indel) events were fewer, they represented 40-45 Mb in each species, i.e., ∼90 Mb difference between the two, giving an ∼3% divergence in this category. Thus, the overall divergence between the genomes is closer to 4%, in keeping with two recent studies (Britten 2002; Watanabe et al. 2004), but far greater than most previous estimates, which were made using shorter alignable sequence fragments. Fortunately, orthologous proteins are still extremely similar, with almost a third being identical, and the typical protein differing only by two amino acids between human and chimpanzees. Thus, the oft-repeated “<1% difference” still applies to amino acid sequences (The Chimpanzee Sequencing and Analysis Consortium 2005). However, a substantial proportion of the differences will likely be neutral with respect to understanding the human condition. The search for functionally important differences is further complicated because many of the important ones may not be within known coding sequences. In addition to protein evolution (Li and Saunders 2005), two other major hypotheses have been put forth to explain human-specific changes, i.e., changes in gene regulation (King and Wilson 1975) and loss-of-function changes (Olson and Varki 2003). We suggest simultaneous genomic and candidate gene approaches toward narrowing the field to the functionally relevant changes captured under these hypotheses.

Genomic approach 1: Narrowing the search to the important differences

Using outgroups to define human-specific changes

Noting a difference between the two genome sequences does not indicate which lineage experienced the change. We can also assume that roughly half of all the differences occurred on the lineage leading to chimpanzees. One or more “outgroups” are needed to determine the ancestral state (nucleotide or otherwise) at any given locus, and thus establish the subset of human-specific changes. The large divergence time between primates and rodents (>60 Myr) means that some mutational events and/or orthology between loci will be obscured when using rodents as outgroups. Sequencing additional primate genomes is thus important, as is the careful choice of an appropriate outgroup species. A primate close enough to humans and chimpanzees is needed to reliably determine substitutional polarity; however, some evolutionary distance is also useful to define regions of functional sequence conservation. The orangutan (Pongo) provides the appropriate level of sequence divergence—unlike Gorilla, which may be too closely related to humans and chimpanzees to provide a consistent signal of polarity (Satta et al. 2000; Klein and Takahata 2002). Sequencing of Pongo is already underway at the Genome Sequencing Centers (GSC) at Washington University, St. Louis and at the Baylor College of Medicine. Rhesus macaque (Macaca mulatta) genome sequencing is also underway, led by the Baylor GSC, in collaboration with the J. Craig Venter Institute Joint Technology Center, and Washington University GSC. The latter genome, representing an Old World monkey, is useful not only for its greater phylogenetic distance from humans (∼25 Myr divergence) (Goodman et al. 2005), but also because the long tradition of rhesus macaque use in biomedical research provides comparative biological information and easier access to tissues and biological materials. Eventually, we will need multiple, additional primate genomes to fully understand the changes that have contributed to traits distinguishing humans (Goodman et al. 2005).

More speculatively, studying the gorilla genome could help define some genetic features related to cognitive function. Until recent human encroachment, the gorilla probably faced less social and cognitive challenges, with a single male overseeing a harem of females, in a relatively predator-free and food-rich environment. This contrasts to the more complex nature of chimpanzee and orangutan environments and behaviors. Also, unlike chimpanzees and orangutans, gorillas fail to pass the “mirror self-recognition test” (Shillito et al. 1999) and have only rarely been observed to use tools (Breuer et al. 2005). It is possible that the gorilla lost some genetic endowments related to cognitive abilities that the great ape common ancestor already had, and comparison to the human, chimpanzee, and orangutan genomes may help identify such changes.

A confounding factor with respect to the ∼3.6× chimpanzee genome sequence is the variable quality of the genome sequence and assembly process. Individual low-quality nucleotide positions can result in sites that appear falsely divergent between humans and chimpanzees. Sequence data from multiple individuals allows some such sites to be identified and eliminated. In addition, some higher-order discrepancies may be introduced by difficulties in assembling certain regions, resulting in false differences between chimpanzees and humans. All of these issues should be resolved as the “polishing” phase of the chimpanzee genome sequencing proceeds (E. Mardis, pers. comm.).

Excluding intra-species polymorphisms

One must also ensure that an apparent genomic difference between humans and chimpanzees is not simply due to a polymorphism in one of the species (Ruvolo 2004). While such polymorphisms are interesting in their own right (e.g., informing us about regions that have undergone recent selective sweeps), they are by definition not contributors to species-specific differences. Sequences from multiple individuals of both species are needed to set aside intra-specific polymorphism and focus only on fixed differences. Some genome-wide polymorphism data for chimpanzees already exists, and human polymorphism can be initially assessed using the numerous SNPs defined in current databases (http://www.ncbi.nlm.nih.gov/projects/SNP/). Surveying sequence variation in a minimum of 10 globally distributed humans has also been suggested to ensure a high probability that a given sequence is fixed (Enard and Pääbo 2004). This minimum number will be higher for chimpanzees because of their greater intra-specific diversity (Gagneux et al. 1999; Kaessmann et al. 1999; Stone et al. 2002; Yu et al. 2003).

Genomic approach 2: Focusing the search

Examine sites of human-specific chromosomal changes

In addition to the few previously known chromosomal inversions and rearrangements between humans and chimpanzees (Yunis et al. 1980; Yunis and Prakash 1982; Nickerson and Nelson 1998; Fan et al. 2002a,b; Dennehey et al. 2004), several new smaller chromosomal regions containing likely inversions and rearrangements were detected (The Chimpanzee Sequencing and Analysis Consortium 2005; Newman et al. 2005). Targeted comparisons with other great ape genomes will help to define which of these events are human specific. One can then examine each breakpoint region in detail, searching for potential changes in local genes or regulatory elements, as has already been done in a few instances (Nickerson and Nelson 1998; Fan et al. 2002a,b; Dennehey et al. 2004).

Examine sites of human-specific insertions and deletions

Insertions or deletions (indels) can range from a few nucleotides to tens of kilobases, and have major impacts on gene structure, expression, and/or function. While the number of indel differences between human and chimpanzee genomes is lower than the number of Single Nucleotide Divergences (SNDs), fixation of such events could suggest that these losses/gains have been adaptive. There are already a few known examples of indels with potential functional consequences differentiating humans and chimpanzees (Table 3), including the human loss of CMAH gene function in humans due to a 92-bp exon deletion (Chou et al. 1998); the loss of two coding exons in the human ELN gene, which contributes to extracellular matrix structure (Szabo et al. 1999), and the complete deletion of SIGLEC13 in humans (Angata et al. 2004).

Table 3.

Some candidate genes and gene families that may contribute to phenotypic differences between humans and apesa

The idea that gene loss was a major contributor to human evolution remains an intriguing one (Olson 1999; Olson and Varki 2003). Interestingly, ∼50 known or predicted human genes were found to be missing partially or entirely in the chimpanzee genome, and some of these differences were confirmed by PCR or Southern blotting (The Chimpanzee Sequencing and Analysis Consortium 2005). Confirmation of the ancestral state of these loci and reciprocal analysis of genes disrupted exclusively in humans requires additional primate outgroup data and further “polishing” of the chimpanzee genome sequence.

Examine gene duplications and retroposed genes

Gene duplication via segmental duplication or retrotransposition of mRNA sequences is an evolutionary mechanism for creating new genes with new biological functions. Duplicated genes can become nonfunctional (pseudogenes), neofunctional (acquire a new function), or subfunctional (adopt a portion of the previous function) (Ohno 1999; Hurles 2004). Such species-specific changes in copy number of gene families may allow for the evolution of new functions unique to the species—and are thus pertinent loci for investigation. A recent study reported that 33% of human duplications are human specific (Cheng et al. 2005); and with an estimated 200-300 species-specific retroposed gene copies in humans and chimpanzees (The Chimpanzee Sequencing and Analysis Consortium 2005), there is an ample landscape to explore. Of note, previous work suggests that humans have experienced more copy-number changes than the great apes (Fortna et al. 2004), such as appears to be the case for the PRAME cluster (Birtle et al. 2005) and the SPANX-B genes (Kouprina et al. 2004a). Also, some neofunctional retroposed loci such as GLUD2 are thought to be involved in hominid brain function (Burki and Kaessmann 2004).

Identify genes and gene families showing evidence of human-specific rapid evolution

Genes that have the signature of accelerated evolution (Clark et al. 2003; Nielsen et al. 2005), i.e., a high ratio of nonsynonymous to synonymous substitutions (Ka/Ks ratios), are good candidates for further study. In particular, genes that show Ka/Ks >1 are possible targets of positive selection (Messier and Stewart 1997; Yang and Bielawski 2000). Several loci with relatively high Ka/Ks ratios between the human and chimpanzee genomes were reported (The Chimpanzee Sequencing and Analysis Consortium 2005). Since a majority of nonsynonymous substitutions are considered deleterious (Enard and Pääbo 2004), a high rate of nonsynonymous substitution between taxa can suggest either adaptive evolution or relaxation of functional constraint. However, this approach is generally conservative (Yang and Bielawski 2000). For example, a Ka/Ks value of <1 does not rule out that a gene has undergone positive selection (Dorus et al. 2004). Also, a protein could have only one or a few important amino acid changes, perhaps confined to a critical domain, motif, or site (Andres et al. 2004; Sonnenburg et al. 2004), and thus not have an elevated Ka relative to Ks. Careful examination of the specific types or positions of amino acid changes such as radical amino acid substitutions (hydrophobic vs. hydrophilic, acidic vs. basic, etc.) in conserved regions is another potential way to identify important changes in protein sequence.

For genes that have zero synonymous changes between humans and chimpanzees, one has to use the adjacent genome sequence to estimate a local intergenic/intronic substitution rate, Ki. Of ∼13,000 human-chimpanzee orthologs studied, ∼4% had an observed Ka/Ki >1 (The Chimpanzee Sequencing and Analysis Consortium 2005). However, given the low divergence between humans and chimpanzees, about half of these are predicted to occur simply by chance if purifying selection is allowed to act nonuniformly across genes.

Examine sites of human-specific repetitive element insertion

Repetitive elements such as LINEs (long interspersed elements) and SINEs (short interspersed elements) can duplicate and spread throughout the genome by reverse transcription, causing potentially important functional changes in coding and flanking sequences (Smit 1999; Carroll et al. 2001). Alu elements are the most abundant class of SINEs in humans, making up ∼10% of the genome (Lander et al. 2001), where they apparently expanded up to three times more than in the chimpanzee genome (The Chimpanzee Sequencing and Analysis Consortium 2005). In addition, most human-specific Alu elements belong to two subfamilies (Ya5 and Yb8) not found in great apes (Carroll et al. 2001). Identification of these human-specific loci makes them candidates for further inquiry. It is possible that some of these elements inserted into functional genes or flanking regions became alternatively spliced introns or promotor regulators, or either deleted or shuffled genomic regions via Alu-Alu recombination.

Look for human-specific gene conversions

Another potential source of differences arises from species-specific gene conversion events that become fixed. Gene conversion homogenizes coding or noncoding sequences between adjacent paralogous gene copies within a species. Conversion may also introduce harmful mutations from a pseudogenized gene copy into a functional copy, or conversely, restore function to a former pseudogene. For example, the 5′ end of human Siglec-11 was converted by an adjacent pseudogene after the common ancestor with chimpanzees (Hayakawa et al. 2005). This resulted in a change in sialic acid-binding properties, as well as new expression in human brain microglia. The gene-converted Siglec-11 can thus be considered the first example of a human-specific protein. More such examples might be found by systematically screening genomic regions, wherein genes and paralogous pseudogenes are nearby one another.

Look for changes in noncoding regions

A majority of comparative genomic studies have focused on coding regions at the expense of examining regulatory sequences (Carroll 2005). However, given the relatively few protein-sequence differences between human and chimpanzees, differential regulation of gene and protein expression is a likely mechanism for explaining human:chimpanzee differences (King and Wilson 1975; Enard et al. 2002a; Caceres et al. 2003; Carroll 2003; Preuss et al. 2004; Uddin et al. 2004). Functional noncoding regions such as promoters, enhancers, flanking sequences, and introns can regulate the expression of genes (Wray et al. 2003), and thus play a role in human evolution. The wealth of new information being generated about noncoding RNA sequences also makes them an intriguing candidates for potential differences (Eddy 2001; Dykxhoorn et al. 2003; Mello and Conte 2004; Kim 2005; Tang 2005).

Genomic approach 3: Looking for human-specific gene expression differences

As mentioned above, species-specific changes in genomic sequence can be manifested in regulatory processes such as timing and location of expression of genes or of functional noncoding sequences, such as siRNAs. However, it is difficult to predict changes in expression simply by comparing genomic sequences (Carroll 2005). Differences in expression pattern between humans and chimpanzees are being investigated using microarray analyses, which allow for a rapid screen of multiple loci expressed in a single tissue at a given time point. Several such analyses and reanalyses have been carried out (Enard et al. 2002a, 2004; Caceres et al. 2003; Gu and Gu 2003; Hsieh et al. 2003; Khaitovich et al. 2004a; Preuss et al. 2004; Uddin et al. 2004). While the rate of brain gene expression changes appears increased in the human lineage, gene expression in the brain is overall more conserved than in other tissues, perhaps because of functional constraints in this complex organ (Enard et al. 2002a; Caceres et al. 2003; Gu and Gu 2003; Preuss et al. 2004). However, there are several caveats. For example, microarrays based on human oligonucleotide sequences may not accurately detect levels of expression in nonhuman primates nor detect significant alternative splicing of mRNAs (Modrek and Lee 2002; Hsieh et al. 2003; Preuss et al. 2004; Steinmetz and Davis 2004). Additionally, mRNA levels are not always good predictors of the actual levels of the gene product found in a cell (Gygi et al. 1999). Moreover, a recent study suggests that most expression differences have little or no significance, and are likely due to neutral evolution (Khaitovich et al. 2004b). Finally, many of the ultimate “gene products” are not the proteins themselves, but result from their enzymatic activity (e.g., lipids, glycans, and bioactive small molecules). Thus, gene expression studies must be complemented by a variety of other “omic” approaches, e.g., proteomics, lipomics, glycomics, etc. Any differences found need to be confirmed by focused biochemical studies on the molecules in question.

Candidate gene approaches

In parallel with the above genomic studies, it is important to continue the more traditional candidate gene approach—as the genomic approach can miss many biologically significant differences. The candidate approach focuses on specific genes, based on some a priori knowledge about which loci or system(s) might be expected to show functionally significant differences between humans and chimpanzees.

Candidate gene approach 1: Making choices on the basis of comparative phenomics

Humans and chimpanzees differ in many morphological, cognitive, and physiological arenas. When attempting to identify the genetic mechanisms responsible, it is logical to focus on genes known or predicted to contribute in some way to the phenotypic differences, i.e., differences in the “phenome.” There are many morphological and physiological traits for which we have some knowledge of the responsible genetic pathways. This can, in turn, allow us to identify appropriate candidate loci underlying the traits. We can then test them for their contribution to uniquely human traits affecting organs such as the skin, brain, and female reproductive system (Table 1). Additionally, many diseases and pathological conditions appear to be unique to humans, and genes involved in some of these disease pathways are known or can be predicted (Table 2). It makes sense to focus first on phenotypes or diseases that appear most directly relevant to explaining the human condition. For example, recent work has suggested that two genes involved in the regulation of brain size appear to have undergone human-specific adaptive evolution (Evans et al. 2005; Mekel-Bobrov et al. 2005). However, we would recommend against a purely “brain-centric” approach that assumes that the only major differences of interest are in the nervous system. A single genetic change may have had an impact on multiple organs, and such a change may be easier to study in organs other than the brain. For example, there are organs such as the skin and its derivatives (e.g., the female breast and the sweat glands) that show at least as many morphological and functional differences as the brain and are easier to study. Genetic differences found in such systems may then help predict which molecules, pathways, or mechanisms have also undergone the most drastic changes during the evolution of the human brain.

Candidate gene approach 2: Choices based on naturally occurring human mutations

A population size of 6 billion humans suggests that many postnatally viable genetic diseases affecting “uniquely human” traits are likely to exist somewhere on the planet. Identifying such defects in the human population, particularly in families, provides an approach for directly linking genotype to phenotype and for choosing genes for human and chimpanzee comparisons. The medical community in particular should be educated and vigilant about such opportunities. A striking outcome of this type of approach is FOXP2, a transcription factor shown to be associated with an inherited human disorder of speech production (Enard et al. 2002b; Zhang et al. 2002). Intriguingly, this putative transcription factor was found to have two human-specific amino acid changes, and the genomic region in question appears to have been positively selected and fixed in humans <200,000 years ago (Enard et al. 2002b). The next step is to look at the consequences of such abnormal genotypes in vitro and by developing transgenic mice that manifest symptoms of the condition. Indeed, mice with a disruption in a single copy of the murine Foxp2 gene manifest a modest developmental delay and a significant alteration in ultrasonic vocalizations that are normally elicited when pups are removed from their mothers (Shu et al. 2005).

Another intriguing finding is that some amino acid sequence variants that cause disease in humans turn out to be a reversion to the conserved ancestral state, still present in the normal chimpanzee (The Chimpanzee Sequencing and Analysis Consortium 2005). This phenomenon has been explained as being due to a high rate of compensatory mutations at other sites in the same protein. Assuming that such mutations are more likely to be fixed by positive selection than by neutral drift, these genes are candidates for adaptive differences between humans and chimpanzees.

Candidate gene approach 3: Making choices based on sequence data

Both “top-down” and “bottom-up” approaches have been successfully used to identify genes potentially involved in uniquely human phenotypes. As discussed above, the “phenome-down” candidate approach involves selecting a candidate gene based on phenotypic information and doing a first-pass genomic workup before proceeding to functional analyses of the gene product. Conversely, a “genome-up” approach (see genomic approach 2 above) can identify genes involved in a particular pathway or system that may be diverged enough (i.e., high Ka/Ks values) or expressed in the organs of interest, or harbor amino acid mutations (i.e., in a conserved domain) to be of interest in a functional screen. Following a search for such candidate genes, a narrowed-down list of loci can then be prioritized according to putative function and position in the pathway of interest (as opposed to a complete list of loci generated from a purely genome-based search, with little functional knowledge linked to them). For example, the human sequence could be “chimpanized” and the gene product compared with that of the native human and chimpanzee gene product via in vitro or transgenic mouse studies in order to investigate the effect of particular sequence changes on a given phenotype.

One example of a functional genetic difference discovered through candidate genomic sequence analysis is MYH16, initially identified as a putative member of the myosin heavy-chain family (Stedman et al. 2004). The MHY16 gene product is most prominently expressed in the jaw muscles of vertebrates. Humans are homozygous for a defective frame-shifted allele. Stedman et al. (2004) dated the mutation to ∼2.4 Mya, approximately the time of origin of the genus Homo, and hypothesized that the human-specific loss of MYH16 function may have affected craniofacial morphology and/or selected for the evolution of larger brain size (Currie 2004; Stedman et al. 2004). Alternatively, since human ancestors switched to a less herbivorous diet at about that time, the loss of MHY16 and jaw muscle strength might simply have been inconsequential and thus drifted to fixation (Currie 2004).

A subsequent analysis of a much larger region of exonic and intronic sequence data flanking the deletion (30,000 vs. 1000 bp) estimated the age of the mutation as ∼5 Mya (Perry et al. 2005), consistent with the timing of human-chimpanzee divergence rather than with the origin of Homo. While this may cast doubt on MYH16's role in the evolution of Homo, the fact remains that the frameshift is human specific and belongs in the repertoire of human-chimpanzee genetic differences. Whether it contributed in some meaningful way to species-specific character differences is a question for continued investigation. Regardless, the availability of the human and chimpanzee genome sequences facilitated the expanded analysis, and underscores the important role that genome sequences can play in our understanding about evolutionary history.

Candidate gene approach 4: A “systems approach” to promising groups of genes

The traditional and powerful approach to genome-wide analysis has been to either consider homologous gene families or genes that are grouped together by similar functions, as in the Gene Ontology (GO) System (Ashburner et al. 2000; The Chimpanzee Sequencing and Analysis Consortium 2005). Of course, since most genes are highly interrelated and function in multiple pathways and systems, no single classification system can do justice to all of the possibilities. One complementary approach is to select a biological process or system not defined under a traditional GO category and focus attention on groups of genes that are thought to be involved. Taking such an approach, Dorus et al. (2004) recently found that a group of genes involved in nervous-system development and function showed evidence of accelerated evolution, when compared with other “housekeeping” genes. A related approach is to assume that a major change in a single gene is likely to affect the evolution of other genes that are functionally connected. For example, following up on the discovery of the CMAH mutation affecting synthesis of one type of sialic acid (Chou et al. 1998), multiple functional genetic differences in the biology of sialic acids have been identified between humans and great apes (Angata et al. 2001; Gagneux et al. 2003; Sonnenburg et al. 2004). Since <60 genes are directly involved in all of the major processes of sialic acid biology, it is reasonable to suggest that this system underwent multiple related changes at some point(s) in human evolution. A systematic comparative analysis of all of these genes between humans and chimpanzees is underway. However, the genes in question are not identified in the GO system as belonging to a single category. By exercising both caution and creativity in how they identify loci united in a biological process, researchers will likely come up with new and novel insights into human and chimpanzee evolution.

Conclusions

Sequencing of the chimpanzee genome signals not an end, but rather a beginning for researchers across diverse fields. The impressive array of data and analyses that have come from this sequencing has provided researchers with new and novel insights into rates and results of molecular processes such as nucleotide substitutions, gene duplications, insertions and deletions, retrotranspositions, and potential karyotypic changes. These data will provide the springboard for understanding the potential consequences of changes in these attributes between humans and chimpanzees. Over the years, scientists have proposed many theories about what makes humans different from the great apes, ranging from subtle changes in regulatory regions (King and Wilson 1975) all the way to the differential loss of gene activity in humans (Olson 1999; Olson and Varki 2003). In fact, given the rather complex series of events evident in the hominid fossil record (Wood and Collard 1999; Cela-Conde and Ayala 2003), every one of these hypothesized genetic mechanisms likely contributed to some degree to human-chimpanzee differences. Understanding what makes us evolutionarily, biomedically, and cognitively different from chimpanzees will require extensive comparative phenomics to complement the comparative genomics now possible using the chimpanzee genome. However, despite decades of research on wild and captive chimpanzees, our overall knowledge about the chimpanzee phenome is very incomplete (Gagneux 2004; Olson and Varki 2004; McConkey and Varki 2005). Studies of intra-specific variation among great apes are in their infancy, and biomedical and physiological data are few. This lack of comparative phenotypic data represents a serious knowledge imbalance. Better phenomic data would enhance our ability to make additional, focused choices for candidate gene studies, and also increase our understanding of the biochemical consequences of any genomic changes we do find. One step to extend the utility of the genome project is to have the phenome much better defined, not only through morphological and anatomical studies, but also via systematic collection of existing data in all fields relevant to understanding the human condition (practically speaking, most of the biological and social sciences). A recently initiated “Great Ape Phenome Project” will begin this process (Varki et al. 1998; Gagneux 2004; Olson and Varki 2004). Of course, the critically endangered status of great apes in the wild, and the fiscal, logistical, and ethical issues of studying great apes in captivity (Gagneux et al. 2005; McConkey and Varki 2005) create a situation wherein new data and resources will not be easy to come by. Regardless, for the purposes of comparison, there is no point in doing any study on a captive great ape that one would not also do on a human subject (Gagneux et al. 2005). Also, all studies on captive apes should try to financially contribute toward their conservation in the wild, e.g., via a proposed Great Apes Conservation Trust, which would receive a 10% overage on all grant funds awarded by various agencies for research projects on ape genomes, phenomes, or behavior (McConkey and Varki 2005).

In the absence of adequate comparative phenomic data between humans and chimpanzees, genomic data provides only part of the blueprint for the phenotype. It is crucial, after identifying differences in the genomic data, to ascertain which ones are important by studying their biological consequences in the laboratory. For example, the relatively limited genomic differences between humans and chimpanzees mean that identifying statistically meaningful differences in rates of evolution are difficult. This limitation will hamper our ability to identify genes or regions of biological interest and importance. Additional primate outgroups will be important for detecting selection over longer time periods and for eliminating false positives. Also, genomic data alone cannot predict epistatic interactions between various loci, nor can it reveal the pleiotropic effects of changes that have occurred in a single gene. Comparative functional studies are necessary to reap the full potential of the genomic data, to translate the observed genetic changes into tangible quantitative differences. However, even such systematic functional studies may not capture the full magnitude of a difference's importance by examining only a single player in a multiplayer interaction. It is likely that, while there may be single-gene changes of large consequence, there will also be synergistic effects of many minor changes at multiple loci. That is, the human condition is likely to be the result of many small effect changes, not just a few large effect mutations. These smaller, subtle changes will be difficult to detect by genomic methods. On the other hand, even clearly identifiable genomic and phenomic differences between humans and chimpanzees may not be directly related to speciation nor to the question of “what makes us human.” Such differences may be a simple byproduct of neutral divergence or genetic drift.

Also, what might seem an important phenotypic difference between humans and great apes might not actually be the most critical factor in determining unique features of the human condition. For example, despite the frequent attention given to big brain size (Wood and Collard 1999; Preuss 2005), there is little evidence for causative connections between brain size and human cognitive abilities (Preuss 2005). Additionally, maximum brain size was achieved long before the emergence of modern human behaviors (Klein 1999; Wood and Collard 1999). Thus, while increased brain size is an impressively human-specific phenotypic difference from great apes, it may well have been just one step (like bipedalism) that occurred earlier, along the way to the emergence of uniquely human cognitive features. Conversely, apparently small phenotypic differences could turn out to play major roles. For example, a small (approximately twofold) difference in the level of a thyroid hormone-binding protein and associated differences in thyroid hormone metabolism between humans and apes (Gagneux et al. 2001) could turn out to be as important as brain-expressed genes in altering the trajectory and mechanisms of human brain development.

Explaining “humanness” is a vague and broadly philosophical question, not easily approached using the genome alone. We prefer to use the term “the human condition” to refer to the entire suite of characters that makes humans different from the great apes. What it means to be human involves quantitative aspects of biochemistry, physiology, and morphology, as well as more qualitative arenas such as cognition, behavior, symbolic communication, and culture. However, unlike typical biological questions, the great majority of experiments one might propose for studying the consequences of species-specific genetic changes are unethical and/or impractical to do, either in humans or in great apes (Gagneux et al. 2005; McConkey and Varki 2005). Meanwhile, studies in mice may not provide sufficient answers. Thus, we suggest that many answers must come from a logical inductive approach that synthesizes many various “clues” to arrive at the best possible “diagnosis”. Also, apparently minor differences between humans and great apes could turn out to be critical. For all of these reasons, we must keep an open mind, and leave no clue unattended to, even if it may appear trivial at first glance. It may well be that findings made from systems that are more ethically accessible and practical to study (such as the blood and the skin) will reveal clues that will eventually allow generation of testable hypotheses about organs like the brain. The other reason to take this type of broad approach to the “human condition” is that there are major biomedical lessons to be learned, which will benefit both humans and great apes, even though they may not be useful in explaining “humanness” in its philosophical sense.

Because of the many limitations mentioned above, we will have to arrive at many of our conclusions by considering all of the facts in aggregate, including some circumstantial evidence. In the final analysis, the best long-term approach to understanding human-chimpanzee differences is to ensure that the next generation of biologists interested in the evolution of the human phenotype is a cross-trained and collaborative one, with an interdisciplinary focus. Interactions among a great many disciplines, such as genomics, biochemistry, physiology, neurobiology, cognitive science, medicine, pathology, anthropology, ecology, primatology, and evolutionary biology, will be essential in dissecting out the key genetic features that contribute to making us human.

Acknowledgments

We thank three anonymous reviewers, Anders Aannestad, Sandra Diaz, Pascal Gagneux, Hopi Hoekstra, Elaine Mardis, Tarjei Mikkelsen, Jennifer Stevenson, and Nissi Varki for valuable comments and suggestions. We also thank Jim Else, Liz Strobert, and Dan Anderson at the Yerkes Primate Center, Atlanta, GA for helpful discussions about great ape diseases. A.V. was supported by grants from the NIGMS, NHLBI, and NCI and by the Harold G. and Leila Y. Mathers Charitable Foundation, and T.K.A. was supported by a postdoctoral fellowship from the American Cancer Society.

Footnotes

  • 2 The term “great apes” is used here in the now colloquial sense, as genomic information no longer supports this species grouping (Goodman 1999). Under the currently more common classification, these species are now grouped together with humans in the family Hominidae.

  • 3 The term “phenome” has been used in multiple publications (e.g., Mahner and Kary 1997; Varki et al. 1998; Paigen and Eppig 2000; Nevo 2001; Walhout et al. 2002; Freimer and Sabatti 2003), but still lacks an accepted definition. Discussions with researchers who have used the term suggest the following definition: “The body of information describing an organism's phenotypes, under the influences of genetic and environmental factors.”

  • 4 Olson, M.V., Eichler, E.E., Varki, A., Myers, R.M., Erwin, J.M., and McConkey, E.H.A. 2002. White paper advocating complete sequencing of the genome of the common chimpanzee, Pan troglodytes (white paper submitted to NHGRI, February 2002).

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3737405.

  • Reich, D.E., Lander, E.S., Waterston, R., Pääbo, S., Ruvolo, M., and Varki, A. 2002. Sequencing the chimpanzee genome (white paper submitted to NHGRI, February 2002).

  • 1 Corresponding author. E-mail a1varki{at}ucsd.edu; fax (858) 534-5611.

References

Web site references

| Table of Contents

Preprint Server