Ultrasensitive allele inference from immune repertoire sequencing data with MiXCR

  1. Dmitry Bolotin2
  1. 1Department of Pathology, Stanford University, Stanford, California 94305, USA;
  2. 2MiLaboratories Incorporated, San Francisco, California 94114, USA;
  3. 3Unit of Animal Genomics, WELBIO, GIGA-R and Faculty of Veterinary Medicine, University of Liège (B34), 4000 Liège, Belgium;
  4. 4Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, Kentucky 40202, USA;
  5. 5Central European Institute of Technology, Masaryk University, 601 77 Brno, Czech Republic
  1. 6 These authors contributed equally to this work.

  • Corresponding authors: amikelov{at}stanford.edu, bolotin{at}milaboratories.com
  • Abstract

    Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here, we present a novel algorithm for extrasensitive and specific variable (V) and joining (J) gene allele inference, allowing the reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing data sets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA and TRB) AIRR-seq data set, representing 134 individuals. This allowed us to assess the genetic diversity within the IGH, TRA, and TRB loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through VDJ.online database.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278775.123.

    • Freely available online through the Genome Research Open Access option.

    • Received November 26, 2023.
    • Accepted October 3, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    This Article

    1. Genome Res. 34: 2293-2303 © 2024 Mikelov et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server