Lossless indexing with counting de Bruijn graphs

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.
Figure 3.

Extraction of k-mer coordinates from sequence ACTAGCTAGCTAG for k = 3 (panel A) and subsequent compression with a diff-transform (panel B), where the coordinates at a node's successor are expected to be the same but incremented by +1, as these nodes are likely to be consecutive in the input sequence(s). The symmetric set difference Formula is used as a diff-operation. Thus, for example, Formula. The inverse transform is performed losslessly by Formula, which follows from the following property: Formula.

This Article

  1. Genome Res. 32: 1754-1764

Preprint Server