Genotyping of selected germline adaptive immune system loci using short-read sequencing data

  1. S. Cenk Sahinalp1
  1. 1National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  2. 2Department of Electrical Engineering, University of Maryland, College Park, Maryland 20742, USA;
  3. 3Department of Computer Science, University of Victoria, British Columbia V8P 5C2, Canada
  • Corresponding authors: mike.ford{at}nih.gov, cenk.sahinalp{at}nih.gov
  • Abstract

    As we enter the age of personalized medicine, healthcare is increasingly focused on tailoring diagnoses and treatments based on patients’ genetic and environmental circumstances. A critical component of a person's physiological makeup is their immune system, but individual genetic variation in many immune system genes has remained resistant to analysis using classical whole-genome or targeted sequencing approaches. In particular, germline adaptive immune system genes, like immunoglobulin (IG) and T cell receptor (TR) genes, are particularly hard to genotype using classic reference-based methods owing to their highly repetitive and homologous nature. In this paper, we present ImmunoTyper2, a new computational toolkit for genotyping the variable genes of the IG lambda and kappa, and the TR loci with short-read whole genome sequence data, using an integer linear programming formulation, as an update to the ImmunoTyper-SR suite, which focused on IGHV region only. We evaluate its genotyping performance using Mendelian concordance analysis in 590 trios from the 1000 Genomes Project, benchmarking 40 samples against HPRC assembly-derived genotypes, and assessing robustness through sequencing depth analysis and parameter sensitivity tests. We introduce allele call confidence metrics to help quantify reliability. We also perform a prospective disease association study, applying ImmunoTyper2 to a WGS data set from a cohort of 461 COVID-19 patients from the COVNET Consortium to demonstrate how it can be applied to investigate genetic associations with disease.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280314.124.

    • Freely available online through the Genome Research Open Access option.

    • Received December 12, 2024.
    • Accepted June 23, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    Articles citing this article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server