The Atlas Genome Assembly System

Table 2.

Building eBACs


Number of

Average (reads)

Std. Dev. (reads)
Distinct WGS in overlapsa 2342 2011
WGS passing rarityb 1431 352
WGS passing overlap qualb 2276 1947
WGS passing bothb 1417 351
WGS binned with BACc 1314 310
WGS binned + mates 1757 390
WGS in Phrap contigs
1675
368
  • a Distinct WGS in all overlaps produced by the overlapper with 95% identity and 100 k-mer copies allowed

  • b Filtering done in Binner based only on overlapper information. Repeat heuristic limits k-mer copies to 12 (three times the coverage). Overlap quality heuristic requires 3 × span/(3 + span-score) ≥ 35, where score is the banded alignment score, and 2 × span/(2 + span-score) would approximate the average distance between discrepancies were there only substitutions (indels have added penalties)

  • c Beyond the k-mer repeat and quality heuristics, only the top six (i.e., coverage × 1.5) WGS overlaps from each end of a BAC read are examined, and they are kept only if strictly better by the quality heuristic than the top discarded overlap

This Article

  1. Genome Res. 14: 721-732

Preprint Server