Table 1.

Cliffy index sizes and compression ratios on the SILVA SSU NR99 data set

DigestionDocument array profiles (Ahmed et al. 2023a)CliffyReduction ratioMean no. of pairsVariance no. of pairs
No digestion22659.014251×7.0954.633
DNA minimizer19557.858249×7.2114.732
Minimizer11724.266275×5.1372.664

[i] Cliffy index sizes and compression ratios on the SILVA SSU NR99 (510,508 rRNA sequences, d = 9118 genera). Each digestion method shows original size, Cliffy-compressed size, reduction ratio, and pair statistics (mean and variance). The expected mean number of pairs based on harmonic series (H9118 + 1 = 10.695) is discussed in Methods. All sizes are in gigabytes.