The Atlas Genome Assembly System

Table 1.

Steps in the Atlas Assembly System


Stage

Program

Action

Comment
1. Data preparation Data quality checks Contamination (reads from other organisms) and mislabeled reads (ie, from another BAC) are identified and corrected if possible.
trim-reads Remove low-quality bases, so that only highest-quality sequence is used for finding overlaps between reads. Trimmed reads only used in finding overlaps; full sequences used to assemble the consensus sequence.
2. Analyze sequence redundancy k-mer-counter Build table of the frequency of oligonucleotides (k-mers). Only WGS reads used to give most complete and random sampling of the genome.
3. Compute read overlap graph overlapper Identify candidate overlaps based on shared rare k-mers. End-to-end criterion scores alignment on entire overlapping portion of reads.
Stringently evaluate overlaps by banded alignment.
Save overlap graph with stringency annotations.
4. eBAC assembly Coassemble WGS and skim reads that have been assigned to the same BAC to produce eBACs.
binner Choose WGS reads with best overlaps to skim reads in a BAC; add read pair mates.
Phrap Assemble WGS and skim reads.
split-scaffold Split misjoined contigs.
split-scaffold Build scaffolds with read pairs.
5. Build bactigs Find overlapping eBACs based on shared reads and more.
BLASTZ Confirm overlap by aligning eBACs.
Compare bactigs to other maps for verification.
6. Assembly of bactigs rolling-phrap Assemble reads in bactigs.
Phrap Assemble contigs in bactigs.
split-scaffold Split misjoins, build scaffolds.
7. Build superbactigs Link bactigs by read pairs and BAC skim read distribution.
8. Build ultrabactigs and map to chromosomes

Link superbactigs by map and synteny data.

This Article

  1. Genome Res. 14: 721-732

Preprint Server