Table 2.

Compression benchmark for subsets of SNPs of human Chromosome 22

Data10,000 SNPs50,000 SNPs80,000 SNPs317,400 SNPs
Original size112.27561.33898.143536.40
Gzip (clevel 9)6.48 (×17.3)

[1 m 0.691 s]
40.68 (×13.8)

[6 m 9.370 s]
65.20 (×13.8)

[10 m 45.854 s]
263.30 (×13.4)

[44 m 5.341 s]
ZPAQ (clevel 3) (Mahoney 2005)5.92 (×18.9)

[1 m 59.611s]
28.83 (×19.5)

[9 m 52.687 s]
45.18 (×19.9)

[24 m 39.143 s]
183.38 (×19.3)

[98 m 46.042 s]
Zstandard (Collet and Kucherawy 2018)11.29 (×9.9)

[0 m 0.209 s]
57.08 (×9.8)

[0 m 1.017 s]
92.75 (×9.7)

[0 m 2.143 s]
372.74 (×9.5)

[0 m 6.535 s]
Genozip (Lan et al. 2021)0.94 (×119.4)

[0 m 12.899 s]
29.89 (×18.8)

[0 m 2.681 s]
48.67 (×18.5)

[0 m 3.249 s]
200.13 (×17.7)

[0 m 11.741 s]
bref3 (Browning et al. 2018)4.35 (×25.8)

[0 m 1.383 s]
19.91 (×28.2)

[0 m 4.322 s]
27.31 (×32.9)

[0 m 10.709 s]
115.52 (×30.6)

[0 m 22.916 s]
VQ-VAE + Zstandard (ours)3.42 (×32.83)

[0 m 12.905 s]
25.37 (×22.12)

[1 m 0.564 s]
40.17 (×22.4)

[1 m 42.669 s]
160.68 (×22.0)

[6 m 37.681 s]
VQ-VAE + Genozip (ours)3.59 (×31.3)

[0 m 6.984 s]
19.44 (×28.9)

[0 m 14.447 s]
27.77 (×32.3)

[0 m 26.471 s]
115.24 (×30.7)

[1 m 23.828 s]

[i] The file size in MB is compared between methods, along with its compression factor and running time. We mark in bold the top two choices based on compression factors.