DupMasker: A tool for annotating primate segmental duplications

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

The size and sequence identity distribution of “novel” duplications. (A) The length distribution of DupMasker duplications not detected by WGAC (termed “novel” SDs) reveals that the majority (99% by number of intervals, 91% by base pair) of these intervals are small fragments (size <1 kb). (B) We found 52.3% (21.9 Mb) of these small intervals are common repeats due to imprecision of boundary definition within repeat-rich regions. We performed a modified WGAC analysis using a relaxed threshold (require nonrepeat alignment ≥100 bp and sequence identity ≥75%) on these “novel” SDs. The analysis revealed alignments for 31% (13.1/41.96 Mb) of these “novel” SDs. Among the 13.1-Mb alignments, 97.7% (12.8/13.1 Mb) represent either small (size <1 kbp) or relatively ancient duplications (sequence identity <90%).

This Article

  1. Genome Res. 18: 1362-1368

Preprint Server