Computational Comparison of Human Genomic Sequence Assemblies for a Region of Chromosome 4

Table 2.

Comparison of Draft Sequence Assemblies Across D4S394-D4S403 Region

Assembly NCBI1 NCBI2 UCSC1 UCSC2 CELERA HYBRID
Version 9/2/00 16/4/01 9/1/01 5/4/01 public NA
Length(bp) 4,220,059 7,982,790 6,597,859 5,725,683 3,359,224 3,510,128
Contigs 9 10 3 4 81 37
Gaps in ctgs 373 466 420 325 197 0
Gaps (bp) 37,300 46,600 590,800 631,400 472,294 0
Frameworkcomparison
 Duplications 3 3 1 3 0 NA
 Deletions 5 8 10 10 6 23
 Rearrangements 12 13 11 5 1 NA
 Misassemblies/Mb 4.74 3.01 3.33 3.14 2.08 NA
NR coverage
 NR fragments 3961 4427 4383 3516 1768 2322
 NR (bp) 1,446,441 1,597,804 1,588,701 1,292,726 619,743 834,492
 Coverage 0.54 0.59 0.59 0.48 0.23 0.31
Annotation
 Repetitive sequence (bp) 2,395,019 3,438,941 2,689,302 1,989,867 2,418,685 1,386,794
 PRS447 (bp) 0 0 25,707 25,707 15,679 0
  • Modified from original version—see Methods.

  • Length: total length of all contigs including gaps within contigs.

  • Framework: a set of 107 sequences accurately ordered across the region.

  • Duplications: observations of the additional appearance of a marker relative to the framework set.

  • Deletions: observations of the absence of marker or a series of contiguous markers relative to the framework set.

  • Rearrangements: observations of marker orders differing from the framework set that are not the result of duplications or deletions.

  • Miassemblies/Mb: total number of duplications, deletions and rearrangements per Mb.

  • NR: a nonredundant set of genomic sequence data from the region.

  • Coverage: the proportion of sequence from the nonredundant genomic sequence data set (NR) present in assembly.

This Article

  1. Genome Res. 12: 424-429

Preprint Server