Table 2.
Compression benchmark for subsets of SNPs of human Chromosome 22
| Data | 10,000 SNPs | 50,000 SNPs | 80,000 SNPs | 317,400 SNPs |
|---|---|---|---|---|
| Original size | 112.27 | 561.33 | 898.14 | 3536.40 |
| Gzip (clevel 9) | 6.48 (×17.3) [1 m 0.691 s] |
40.68 (×13.8) [6 m 9.370 s] |
65.20 (×13.8) [10 m 45.854 s] |
263.30 (×13.4) [44 m 5.341 s] |
| ZPAQ (clevel 3) (Mahoney 2005) | 5.92 (×18.9) [1 m 59.611s] |
28.83 (×19.5) [9 m 52.687 s] |
45.18 (×19.9) [24 m 39.143 s] |
183.38 (×19.3) [98 m 46.042 s] |
| Zstandard (Collet and Kucherawy 2018) | 11.29 (×9.9) [0 m 0.209 s] |
57.08 (×9.8) [0 m 1.017 s] |
92.75 (×9.7) [0 m 2.143 s] |
372.74 (×9.5) [0 m 6.535 s] |
| Genozip (Lan et al. 2021) | 0.94 (×119.4) [0 m 12.899 s] |
29.89 (×18.8) [0 m 2.681 s] |
48.67 (×18.5) [0 m 3.249 s] |
200.13 (×17.7) [0 m 11.741 s] |
| bref3 (Browning et al. 2018) | 4.35 (×25.8) [0 m 1.383 s] |
19.91 (×28.2) [0 m 4.322 s] |
27.31 (×32.9) [0 m 10.709 s] |
115.52 (×30.6) [0 m 22.916 s] |
| VQ-VAE + Zstandard (ours) | 3.42 (×32.83) [0 m 12.905 s] |
25.37 (×22.12) [1 m 0.564 s] |
40.17 (×22.4) [1 m 42.669 s] |
160.68 (×22.0) [6 m 37.681 s] |
| VQ-VAE + Genozip (ours) | 3.59 (×31.3) [0 m 6.984 s] |
19.44 (×28.9) [0 m 14.447 s] |
27.77 (×32.3) [0 m 26.471 s] |
115.24 (×30.7) [1 m 23.828 s] |
-
The file size in MB is compared between methods, along with its compression factor and running time. We mark in bold the top two choices based on compression factors.











