Ultrasensitive allele inference from immune repertoire sequencing data with MiXCR
- Artem Mikelov1,6,
- George Nefediev2,6,
- Alexander Tashkeev3,
- Oscar L. Rodriguez4,
- Diego Aguilar Ortmans3,
- Valeriia Skatova2,
- Mark Izraelson2,
- Alexey N. Davydov2,5,
- Stanislav Poslavsky2,
- Souad Rahmouni3,
- Corey T. Watson4,
- Dmitriy Chudakov2,5,
- Scott D. Boyd1 and
- Dmitry Bolotin2
- 1Department of Pathology, Stanford University, Stanford, California 94305, USA;
- 2MiLaboratories Incorporated, San Francisco, California 94114, USA;
- 3Unit of Animal Genomics, WELBIO, GIGA-R and Faculty of Veterinary Medicine, University of Liège (B34), 4000 Liège, Belgium;
- 4Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, Kentucky 40202, USA;
- 5Central European Institute of Technology, Masaryk University, 601 77 Brno, Czech Republic
-
↵6 These authors contributed equally to this work.
Abstract
Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here, we present a novel algorithm for extrasensitive and specific variable (V) and joining (J) gene allele inference, allowing the reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing data sets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA and TRB) AIRR-seq data set, representing 134 individuals. This allowed us to assess the genetic diversity within the IGH, TRA, and TRB loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through VDJ.online database.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278775.123.
-
Freely available online through the Genome Research Open Access option.
- Received November 26, 2023.
- Accepted October 3, 2024.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











