Jim Shaw; Yun William Yu

Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.

Data points represent the slopes of the linear regressions in Figure 2 for extension, with the corresponding value of k (which is k = C log n for a constant $\text{[math]}$ ). This dependence of the genome size, n (x-axis), is decently approximated by a naive A₁log (n) + B₁ fit, where A₁ and B₁ are parameters. However, our theory states that the dependence should be log (n)n^Cα with Cα ≈ 0.08 when θ = 0.05. Fitting A₂log (n)n^0.08 + B₂ gives a better R² value (0.928 vs. 0.766) with the same number of parameters (two parameters for both fits), indicating the goodness of our theoretical predictions.

This Article

Published in Advance March 29, 2023, doi: 10.1101/gr.277637.122 Genome Res. 2023. 33: 1175-1187

Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic

This Article

Preprint Server

Current Issue

In This Issue