Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human
- Michael Brudno1,
- Alexander Poliakov2,
- Asaf Salamov3,4,
- Gregory M. Cooper5,
- Arend Sidow5,6,
- Edward M. Rubin2,3,
- Victor Solovyev3,4,
- Serafim Batzoglou1,7, and
- Inna Dubchak2,3,7
- 1 Department of Computer Science, Stanford University, Stanford, California 94305, USA
- 2 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- 3 U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
- 4 Softberry Inc., Mount Kisco, New York 10549, USA
- 5 Department of Genetics, Stanford University, Stanford, California 94305-5324, USA
- 6 Department of Pathology, Stanford University, Stanford, California 94305-5324, USA
Abstract
We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment and consists of two main steps: (1) alignment of the mouse and rat genomes, and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human, and 97% of all alignments with human sequence >100 kb agree with a three-way synteny map built independently, using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment, and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.
Footnotes
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2067704.
-
↵7 Corresponding authors. E-MAIL ildubchak{at}lbl.gov; FAX (510) 486-5717. E-MAIL serafim{at}cs.stanford.edu; FAX (650) 725-1449.
-
- Accepted December 28, 2003.
- Received October 13, 2003.
- Cold Spring Harbor Laboratory Press











