Memory-bound k-mer selection for large and evolutionarily diverse reference libraries

Table 1.

Computational resources needed for a database built on 10,470 reference genomes and 756 queries (50 M short reads)

Running time Peak memory
Query Library Query Library
Kraken 2 88 sec, 96 threads 3.5 h, 128 threads 47 GB 50 GB
CLARK 676 sec, 96 threads 13.5 h, 128 threads 150 GB 370 GB
CONSULT-II 2191 sec, 96 threads 19 h, 128 threads 141 GB 157 GB
KRANK-hs 1167 sec, 96 threads 4.5 h, 32 × 4 threads 51 GB 32 GB per batch
KRANK-lw 787 sec, 96 threads 2.5 h, 32 × 4 threads 13 GB 8 GB per batch
  • Measurements for both queries and library building were performed on a machine with 2.2 GHz AMD EPYC 7742 processors. Reported library building times are for 256 and 512 batches, respectively, for KRANK-lw and KRANK-hs, each with four threads, distributed across 32 cluster nodes and run in parallel.

This Article

  1. Genome Res. 34: 1455-1467

Preprint Server