Confidence in comparative genomics
- Genome Informatics Section, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
This extract was created in the absence of an abstract.
Comparative sequence analysis has become a widespread approach for identifying and characterizing functional elements encoded within genomic sequences. Marked by early successes (for review, see Hardison 2000), a tremendous amount of sequencing capacity has been, and continues to be, utilized for sequencing genomes of related species. Indeed, the choice of genomes selected for sequencing has less to do with the biology or utility of a particular species as an experimental model organism, but rather is guided more by their placement on the evolutionary tree of life. Optimal species are now characterized by an evolutionary distance (typically measured in neutral substitutions per site) that maximizes both sequence alignability and the ability to distinguish neutral DNA from sequences under evolutionary selection. This concept is exemplified in the mammalian species selected for low-redundancy whole-genome shotgun sequencing (Margulies et al. 2005; Green 2007) as well as the 12 fly genomes selected for comparative analyses (Drosophila 12 Genomes Consortium 2007; Stark et al. 2007).
With the increased availability of all these species’ genomes, various algorithms have been developed to aid in the identification of sequences under purifying selection (Blanchette and Tompa 2002; Boffelli et al. 2003; Margulies et al. 2003; Cooper et al. 2005; Siepel et …











