Read-level genotyping of short tandem repeats using long reads and single-nucleotide variation with STRkit

  1. Guillaume Bourque1,2,4
  1. 1Canadian Centre for Computational Genomics, Montréal, Québec H3A 0G1, Canada;
  2. 2Department of Human Genetics, McGill University, Montréal, Québec H3A 0G1, Canada;
  3. 3Genomic Medicine Center, Children's Mercy Hospital and Research Institute, Kansas City, Missouri 64108, USA;
  4. 4Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
  • Corresponding authors: david.lougheed{at}mail.mcgill.ca, guil.bourque{at}mcgill.ca
  • Abstract

    Variation in short tandem repeats (STRs) is implicated in Mendelian disease and complex traits but can be difficult to resolve with short-read genome sequencing. We present STRkit, a software package for genotyping STRs using long-read sequencing (LRS) that uses proximate single-nucleotide variants to improve genotyping accuracy without a priori haplotype information. We show that STRkit has unique strengths versus other methods: It can use data from both major LRS technologies (Pacific Biosciences HiFi [PacBio] and Oxford Nanopore Technologies [ONT]) to output both allele- and read-level copy number and sequence; it performs best in benchmarking with F1 scores of 0.9631 and 0.9544 with PacBio and ONT data, respectively; it achieves higher rates of Mendelian consistency than other genotyping tools; and it is open source software. STRkit's features open up new possibilities for association testing, assessing patterns of STR inheritance and better understanding the functional effects of these notable repeat elements.

    Footnotes

    • Received April 9, 2025.
    • Accepted January 8, 2026.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    This article has not yet been cited by other articles.

    | Table of Contents

    Preprint Server