Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall

William T. Harvey; Peter Ebert; Jana Ebler; Peter A. Audano; Katherine M. Munson; Kendra Hoekzema; David Porubsky; Christine R. Beck; Tobias Marschall; Kiran Garimella; Evan E. Eichler

doi:10.1101/gr.278070.123

Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall

- ¹Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- ²Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany;
- ³Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany;
- ⁴Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany;
- ⁵The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- ⁶Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA;
- ⁷Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
- ⁸Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA

Published December 7, 2023. https://doi.org/10.1101/gr.278070.123

Download PDF Cite Article Permissions

Current Issue:

May 2026, Vol. 36, No. 5

Focus view

Abstract

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F₁ score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F₁ score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

Article contents

Article (Back to top)
- Abstract
- Notes

Announcement(s)

Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall

Cite this article

Share

Current Issue:

Abstract

Article contents

Announcement(s)