Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.
Figure 3.

Data points represent the slopes of the linear regressions in Figure 2 for extension, with the corresponding value of k (which is k = C log n for a constant Formula). This dependence of the genome size, n (x-axis), is decently approximated by a naive A1log (n) + B1 fit, where A1 and B1 are parameters. However, our theory states that the dependence should be log (n)nCα with Cα ≈ 0.08 when θ = 0.05. Fitting A2log (n)n0.08 + B2 gives a better R2 value (0.928 vs. 0.766) with the same number of parameters (two parameters for both fits), indicating the goodness of our theoretical predictions.

This Article

  1. Genome Res. 33: 1175-1187

Preprint Server