Efficient de novo assembly of large genomes using compressed data structures

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

High-level diagram of the SGA assembly pipeline. The assembly has three main stages: error correction, contig assembly, and scaffolding. The error correction stage starts by building an FM-index for the reads (sga index) then performing error correction (sga correct). The assembly stage takes the corrected reads as input, re-indexes them, removes duplicate and low-quality reads, then constructs contigs. The scaffolding stage realigns the original reads to the contigs using BWA, constructs a scaffold graph using the alignments, and outputs a final set of scaffolds in FASTA format. For clarity, minor steps of the pipeline have been omitted from the diagram.

This Article

  1. Genome Res. 22: 549-556

Preprint Server