Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human

  1. Michael Brudno1,
  2. Alexander Poliakov2,
  3. Asaf Salamov3,4,
  4. Gregory M. Cooper5,
  5. Arend Sidow5,6,
  6. Edward M. Rubin2,3,
  7. Victor Solovyev3,4,
  8. Serafim Batzoglou1,7, and
  9. Inna Dubchak2,3,7
  1. 1 Department of Computer Science, Stanford University, Stanford, California 94305, USA
  2. 2 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
  3. 3 U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
  4. 4 Softberry Inc., Mount Kisco, New York 10549, USA
  5. 5 Department of Genetics, Stanford University, Stanford, California 94305-5324, USA
  6. 6 Department of Pathology, Stanford University, Stanford, California 94305-5324, USA

Abstract

We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment and consists of two main steps: (1) alignment of the mouse and rat genomes, and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human, and 97% of all alignments with human sequence >100 kb agree with a three-way synteny map built independently, using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment, and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2067704.

  • 7 Corresponding authors. E-MAIL ildubchak{at}lbl.gov; FAX (510) 486-5717. E-MAIL serafim{at}cs.stanford.edu; FAX (650) 725-1449.

    • Accepted December 28, 2003.
    • Received October 13, 2003.
| Table of Contents

Preprint Server