Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy-chain locus
- Mari B. Gornitzka1,
- Egil Røsjø1,2,
- Uddalok Jana3,
- Easton E. Ford3,
- Alan Tourancheau4,
- William D. Lees5,
- Zachary Vanwinkle3,
- Melissa L. Smith3,
- Corey T. Watson3,6 and
- Andreas Lossius1,2,6
- 1Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, Norway;
- 2Department of Neurology, Akershus University Hospital, 1478 Lørenskog, Norway;
- 3Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, Kentucky 40292, USA;
- 4IBENS, Département de biologie, École normale supérieure, Université PSL, CNRS, INSERM, 75005 Paris, France;
- 5Clareo Biosciences, Louisville, Kentucky 40222, USA
-
↵6 These authors contributed equally to this work.
Abstract
Genetic diversity within the human immunoglobulin heavy-chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore Technologies ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We apply this method to four individuals and validate the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, with only some residual indel errors. Moreover, when applied to the reference material HG002, our pipeline reveals no base differences and a limited number of indels compared with the telomere-to-telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovers 28 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 within the IGH constant region (IGHC) and, within the IGHV region, an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280400.125.
- Received January 7, 2025.
- Accepted August 15, 2025.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











