Table 12.

Alignment Times in Seconds of 10,000 ESTs (Average Size 380 Bases) Against Human Genomic Sequence Using Various K Sizes and N Sizes

K N 2×106 2×107 2×108
1023.935.6680.1
1033.221.4348.7
1122.48.192.4
1132.36.561.8
1223.97.039.9
1233.76.433.8

[i] The 2 × 106 genomic sequence is ctg12414, which is 2,034,363 bases long and was taken from the December 2000 UCSC human genome assembly (http://genome.ucsc.edu). The 2 × 107genomic sequence is ctg15424 and is 20,341,418 bases long. The 2 × 108 column is chromosome 4 and is 200,175,155 bases long. The two major components of the run-time are the time it takes to bin and sort the K-mer hits (clumping is almost instantaneous after sorting), and the time it takes to extend the clumps into alignments. The bin/sort time depends on the number of hits, which is proportional to 4−K. The bin/sort time is somewhere between O(n) and O(n log n). The extend time is linear with respect to the number of clumps.