Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Runtime results on seed-chain-extend between for two sequences of length n and Formula, where θ = 0.10 and α = −log 4(1 − θ). (A) The y-axis shows the runtime ratios (with 95% confidence intervals) for iteration k + 1 divided by runtime for iteration k, where the sequence length nk is plotted on the x-axis. As n grows, both the sketched and nonsketched extension ratios asymptotically approach the predicted ratio of Formula runtimes. (B) Multiplicative speed-up for sketched versus nonsketched alignment when sketching with density Formula, where k = C log n. The slow-down for extension flattens out, whereas the chaining speed-up is almost linear on the log-scaled x-axis; chaining speed-up dominates extension slow-down as predicted by our theory.

This Article

  1. Genome Res. 33: 1175-1187

Preprint Server