Efficient storage of high throughput DNA sequencing data using reference-based compression

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.
Figure 3.

Storage components for three parameterizations of simulated data: 0.1% error and 1× coverage (left panel), 1% error and 1× coverage (middle), and 1% error and 25× coverage (right). readpos and readflags is the storage of the read positions and read flags (strand, exact match), respectively. Variation storage for substitutions (subst), insertions (insert), and deletions (del) is split into positional information (pos), flags (flags), and bases (bases, for substitutions and insertions) or length (len, for deletions). The pie charts show overall storage requirements, where readinfo sums over read positions and read flags, and variation is the sum over all variation storage components.

This Article

  1. Genome Res. 21: 734-740

Preprint Server