Genome-wide characterization of centromeric satellites from multiple mammalian genomes
- Can Alkan1,6,
- Maria Francesca Cardone2,6,
- Claudia Rita Catacchio2,
- Francesca Antonacci1,
- Stephen J. O'Brien3,
- Oliver A. Ryder4,
- Stefania Purgato5,
- Monica Zoli5,
- Giuliano Della Valle5,
- Evan E. Eichler1 and
- Mario Ventura1,2,7
- 1Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA;
- 2Department of Genetics and Microbiology, University of Bari, 70126 Bari, Italy;
- 3Laboratory of Genomic Diversity, NCI-Frederick, Frederick, Maryland 21702-1201, USA;
- 4Conservation and Research for Endangered Species (CRES), Zoological Society of San Diego, San Diego, California 92112, USA;
- 5Dipartimento di Biologia Evoluzionistica Sperimentale, University of Bologna, 40126 Bologna, Italy
-
↵6 These authors contributed equally to this work.
Abstract
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Footnotes
-
↵7 Corresponding author.
E-mail mventura{at}uw.edu; m.ventura{at}biologia.uniba.it.
-
[Supplemental material is available online at http://www.genome.org. The RepeatNet algorithm is freely available at http://eichlerlab.gs.washington.edu/software/repeatnet/.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.111278.110.
- Received June 3, 2010.
- Accepted October 12, 2010.
- Copyright © 2011 by Cold Spring Harbor Laboratory Press











