Efficient mapping of accurate long reads in minimizer space with mapquik

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

Overview of the long-read mapping pipeline using mapquik and comparison with state-of-the-art methods using minimizers as seeds. State-of-the-art read mappers such as minimap2 and Winnowmap2 (top; pink-shaded) build an index for a reference sequence by computing window minimizers (k = 3, w = 5; A) and by storing the positions of the minimizers in the index (B). To map a query sequence using the reference index (top right; nucleotide C in blue denotes a sequencing error), mappers compute the minimizers on the query sequence (C) and find matches between the minimizers of the query and those in the reference index. Once minimizer matches are found, minimap2 and Winnowmap2 perform a colinear chaining step to output a high-scoring set of matches, using dynamic programming (D). In contrast, mapquik (bottom; green-shaded) indexes reference sequences by generating k-min-mers, k consecutive, randomly selected minimizers of length ℓ (k = 3, ℓ = 2; E) and storing only the k-min-mers that appear exactly once in the reference (F). mapquik stores the start and end positions of each k-min-mer, along with the order in which the k-min-mers appear. To map a query sequence using the k-min-mer index, mapquik first obtains matches between the query and the reference index by querying the index with each query k-min-mer (G). k-min-mer matches are extended if the next immediate pair of k-min-mers also match (H). Instead of a colinear chaining step, mapquik performs a linear-time pseudochaining step to locate matches that are colinear with the match with the highest number of k-min-mers (I).

This Article

  1. Genome Res. 33: 1188-1197

Preprint Server