Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

Extension and chaining time for real ONT reads with ≈5% sequence divergence. k-mer size was chosen to be k = C log n for reference length n and Formula with error parameter θ = 0.05. Exact data sets are described in Supplemental Table S1. Note the scaled axes for the SARS-CoV-2 reads, which were much smaller than the other data sets. The well-fit linear regression lines indicate essentially linear runtime in read length with fixed reference length n (i.e., fixed k = C log n and constant C), although larger human chaining time variance is owing to repetitive k-mers.

This Article

  1. Genome Res. 33: 1175-1187

Preprint Server