Whole-genome analysis of Alu repeat elements reveals complex evolutionary history
Abstract
Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the enrichment of gene-rich segmental duplications in the human genome, and they form a rich fossil record of primate and human history. Alu repeat elements are believed to have arisen from the replication of a small number of source elements, whose evolution over time gives rise to the 31 Alu subfamilies currently reported in Repbase Update. We apply a novel method to identify and statistically validate 213 Alu subfamilies. We build an evolutionary tree of these subfamilies and conclude that the history of Alu evolution is more complex than previous studies had indicated.
Footnotes
-
[Supplemental material is available online at www.genome.org.]
-
↵2 The Alu section of Repbase Update also contains 3 additional subfamilies, each roughly 140 bp long, representing monomeric ancestors that pre-date modern dimeric Alu repeats and are thus outside the scope of this study.
-
↵3 Our precise methodology was to search for the two clusters of repeat elements maximizing the likelihood of the data, using the EM algorithm (Dempster et al. 1977) with many random seeds.
-
↵4 To adhere to the existing nomenclature (Batzer et al. 1996), we name our subfamilies by assigning them to existing Repbase Update subfamilies, e.g., AluSx, AluSx_2, AluSx_3, etc.
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2693004.
-
An implementation of our algorithm is available online at http://www.cs.ucsd.edu/~aprice/alu.html.
-
↵1 Corresponding author. E-mail aprice{at}cs.ucsd.edu; fax (858) 534-7029.
-
- Accepted August 14, 2004.
- Received April 18, 2004.
- Cold Spring Harbor Laboratory Press











