Genome Fraction of Duplicated Sequence
WGAC (≥90%) | WSSD (≥95%) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mouse MGSCv3 | Mouse MGSCv3 | |||||||||||
| Human (build 31) | w/unplaced | w/o unplaced | Mouse build29 | BACs | w/unplaced | w/o unplaced | ||||||
| >1 kb | 5.25% | N.D. | N.D. | 2.56% | N.D. | N.D. | N.D. | |||||
| >5 kb | 4.78% | 1.95% | 1.01% | 2.00% | N.D. | N.D. | N.D. | |||||
| >10 kb | 4.52% | 0.70% | 0.48% | 1.74% | 1.51% | 2.09% | 0.27% | |||||
| >20 kb | 4.06% | 0.11% | 0.10% | 1.14% | 1.46% | 2.01% | 0.23% | |||||
[i] Whole-genome assembly comparison (WGAC) identified duplications ≥90%; whereas whole genome shotgun sequence detection (WSSD) only reliably identified duplications ≥95% and ≥10 kb. Genome assemblies analyzed included human build31 (2,860,784,610 bp); mouse MGSCv3 including unplaced contigs (2,475,067,632 bp) and excluding unplaced contigs (2,374,117,067 bp); mouse NCBI build 29 (439,076,820). Mouse build29 was hand curated removing high copy repeats (poorly characterized LINEs and LTRs). Similarly, missed allelic overlaps were also removed (Methods). WSSD detection was two tiered. All finished clones (4298 BACs totaling 706,309,797 bp) and MGSCv3 assembly segments (400 kb) were scanned for regions encompassing ≥10 kb with divergence ratios of ≥0.80 based on Megablast alignments. We reanalyzed positive sequences by realigning and rescoring with quality all reads between 98% and 100% identity using Needleman-Wunsch global alignment. Regions of high divergence were then reanalyzed. Regions encompassing ≥10 kb were then further defined in 1-kb windows to determine more precisely the boundaries of the duplication. The amount of duplication within the MGSCv3 WSSD was corrected for intervening gaps and 7,882,708 bases of major and minor centromeric satellite.