
Evaluating ex-zd bit-reduction strategy for lossy compression of ONT PromethION data. (A) Frequency distributions for raw signal values in a typical ONT PromethION data set (HG002-Prom5K Chr 22 subset; see Supplemental Table S1) represented in native 11-bit encoding (red) or encoded with a smaller number of bits (10–5 bits). (B) Bar chart shows relative file sizes for the same data set in BLOW5 format with current lossless compression methods (gray bars) compared to lossy ex-zd compression with decreasing numbers of bits (11-bit down to 5-bit). Sizes are shown as percentages relative to zlib-svb-zd, which is currently the default compression method used in slow5tools/slow5lib. Native POD5 format, which uses zstd-svb12-zd (VBZ) compression, is shown for comparison. (C) Bar chart shows basecalling accuracy, as measured by mean read:reference identity, for the same data set and bit-reduced encodings as above. Basecalling accuracies are shown separately for ONT's Dorado (light gray) versus Guppy (dark gray) software and SUP (upper) versus HAC (lower) models. (D) Density scatterplots show read:reference identities for individual basecalled reads. The left plot compares native 11-bit data versus bit-reduced 8-bit data, both basecalled with Dorado SUP model. The right plot shows native 11-bit data basecalled with Guppy versus Dorado software, using the same SUP basecalling model.











