
Suitability of ex-zd bit-reduction for ONT direct RNA sequencing data. (A) Bar chart shows relative file sizes for a typical ONT PromethION data set generated with the RNA004 sequencing kit with current lossless compression methods (gray bars) compared to lossy ex-zd compression with data encoded with a decreasing number of bits (11-bit down to 5-bit). Sizes are shown as percentages relative to svb-zd-zlib, which is currently the default compression method used in slow5tools/slow5lib. Native POD5 format, which uses svb12-zd-zstd (VBZ) compression, is shown for comparison. (B) Bar chart shows Dorado SUP basecalling accuracy, as measured by mean read:reference identity, for the same bit-reduced encodings as in A. Basecalling accuracies are shown separately for a human mRNA sample sequenced on PromethION (UHRR-Prom) and for SIRV synthetic RNA controls sequenced on a MinION (SIRV-Min) (see Supplemental Table S1). (C) Density scatterplots show read:reference identities for individual basecalled reads from the UHRR-Prom data set above. The left plot compares native 11-bit data versus bit-reduced 8-bit data, both basecalled with Dorado SUP model. The right plot shows native 11-bit data basecalled with Dorado SUP model, using two different but closely matched Dorado software versions (0.4.0 vs. 0.4.3). (D) For the same data sets as in C, scatterplots show m6A methylation frequencies for individual reads, that is, the fraction of ‘A’ bases within a given read that are called as being m6A by the m6Anet software.











