Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history

Abstract

In addition to variation in terms of single nucleotide polymorphisms, whole genomic regions differ in copy number among individuals. These differences are referred to as Copy Number Variants (CNVs) which recent mapping studies have shown to be prevalent in mammalian genomes. CNVs that reach fixation in the population are give rise to Segmental Duplications (SDs). SDs, in turn, are operationally defined as long (>1kb) stretches of duplicated DNA with high sequence identity. Here, we investigate formation signatures for both phenomena. NAHR employs existing repeats to generate new duplications. Therefore, we examine in detail co-occurrence patterns of different genomic repeat features with both CNVs and SDs. First, we analyzed the localization of SDs with other SDs (i.e. their co-localization) and find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution. This observation suggests a preferential attachment mechanism, i.e. existing SDs are likely to be involved in creating new ones nearby. Furthermore, we observe a significant association of CNVs with SDs, but show that a SD-mediated mechanism could only account for a fraction (maximally 28%) of CNVs. As another major contributor to SD formation, Alu elements a type of repeat had previously been identified by virtue of their strong association with SDs. While we also observe this association, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alu elements. In the same vein, we report an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent for CNVs. Finally, we find a number of other repeat elements, namely LINEs and microsatellites, to be significantly more associated with CNVs than SDs, which may explain their formation. Overall, we find that a shift in predominant formation mechanism occurred in the recent evolutionary history. About 40 Mya ago, during a burst in retrotransposition activity (the "Alu burst"), non-allelic homologous recombination (NAHR), mediated by Alus, was the main driver of such genome rearrangement; however, its relative importance has decreased markedly since then, with proportionally more events now being associated with other repeats and with Non-homologous end-joining. In contrast to the precisely known SD boundaries, most current data on CNVs is of somewhat low resolution, which makes exact conclusions about their surrounding sequences difficult. Therefore, in addition to the coarse-grained analysis above, we performed targeted sequencing of 67 CNV breakpoints and complemented this with previously sequenced ones. We then analyzed the sequence signatures of this combined set of over 600 breakpoints to verify the conclusions that were drawn from the coarse grained analysis. Our findings support the above findings; only few breakpoints show associations with Alu elements, more show formation signatures of NAHR mediated by SDs or LINES.

Footnotes

    • Received May 27, 2008.
    • Accepted September 30, 2008.
  • This manuscript is Open Access.

Articles citing this article

OPEN ACCESS ARTICLE
ACCEPTED MANUSCRIPT

Preprint Server