A new compression strategy to reduce the size of nanopore sequencing data
- Kavindu Jayasooriya1,
- Sasha P Jenner2,
- Pasindu Marasinghe3,
- Udith Senanayake3,
- Hassaan Saadat4,
- David Taubman4,
- Roshan Ragel3,
- Hasindu Gamaarachchi5,7 and
- Ira W Deveson6
- 1 Garvan Institute of Medical Research, Murdoch Children's Research Institute, UNSW Sydney, University of Peradeniya;
- 2 Garvan Institute of Medical Research;
- 3 University of Peradeniya;
- 4 UNSW Sydney;
- 5 UNSW Sydney, Garvan Institute of Medical Research, Murdoch Children's Research Institute;
- 6 Garvan Institute of Medical Research, Murdoch Children's Research Institute, UNSW Sydney
Abstract
Nanopore sequencing is an increasingly central tool for genomics. Despite rapid advances in the field, large data volumes and computational bottlenecks continue to pose major challenges. Here we introduce ex-zd, a new data compression strategy that helps address the large size of raw signal data generated during nanopore experiments. Ex-zd encompasses both a lossless compression method, which modestly outperforms all current methods for nanopore signal data compression, and a 'lossy' method, which can be used to achieve dramatic additional savings. The latter component works by reducing the number of bits used to encode signal data. We show that the three least significant bits in signal data generated on instruments from Oxford Nanopore Technologies (ONT) predominantly encode noise. Their removal reduces file sizes by half without impacting downstream analyses, including basecalling and detection of modified DNA or RNA bases. Ex-zd compression saves hundreds of gigabytes on a single ONT sequencing experiment, thereby increasing the scalability, portability, and accessibility of nanopore sequencing.
- Received October 2, 2024.
- Accepted May 2, 2025.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.











