Multiple whole-genome alignments without a reference organism

  1. Inna Dubchak1,2,
  2. Alexander Poliakov1,
  3. Andrey Kislyuk3 and
  4. Michael Brudno4,5
  1. 1 Genome Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  2. 2 DOE Joint Genome Institutes, Walnut Creek, California 94598, USA;
  3. 3 Department of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332, USA;
  4. 4 Department of Computer Science, Banting and Best Department of Medical Research, and Centre for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5R 3G4, Canada

    Abstract

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families—perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

    Footnotes

    | Table of Contents

    Preprint Server