Sergey Koren; Brian P. Walenz; Konstantin Berlin; Jason R. Miller; Nicholas H. Bergman; Adam M. Phillippy

Figure 1.

A full Canu run includes three stages: correction (green), trimming (red), and assembly (purple). Canu stages share an interface for binary on-disk stores (databases), as well as parallel store construction. In all stages, the first step constructs an indexed store of input sequences, generates a k-mer histogram, constructs an indexed store of all-versus-all overlaps, and collates summary statistics. The correction stage (green) selects the best overlaps to use for correction, estimates corrected read lengths, and generates corrected reads. The trimming stage (red) identifies unsupported regions in the input and trims or splits reads to their longest supported range. The assembly stage (purple) makes a final pass to identify sequencing errors; constructs the best overlap graph (BOG); and outputs contigs, an assembly graph, and summary statistics.

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

This Article

Preprint Server

Current Issue

In This Issue