Accurate short-read alignment through r-index-based pangenome indexing

  1. Christina Boucher1
  1. 1Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida 32611, USA;
  2. 2Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA;
  3. 3Department of Computer Science, John Hopkins University, Baltimore, Maryland 21218, USA
  1. 4 These authors contributed equally to this work.

  • Corresponding author: christinaboucher{at}ufl.edu
  • Abstract

    Aligning to a linear reference genome can result in a higher percentage of reads going unmapped or being incorrectly mapped owing to variations not captured by the reference, otherwise known as reference bias. Recently, in efforts to mitigate reference bias, there has been a movement to switch to using pangenomes, a collection of genomes, as the reference. In this paper, we introduce Moni-align, the first short-read pangenome aligner built on the r-index, a variation of the classical FM-index that can index collections of genomes in O(r)-space, where r is the number of runs in the Burrows–Wheeler transform. Moni-align uses a seed-and-extend strategy for aligning reads, utilizing maximal exact matches as seeds, which can be efficiently obtained with the r-index. Using both simulated and real short-read data sets, we demonstrate that Moni-align achieves alignment accuracy comparable to vg map and vg giraffe, the leading pangenome aligners. Although currently best suited for aligning to localized pangenomes owing to computational constraints, Moni-align offers a robust foundation for future optimizations that could further broaden its applicability.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279858.124.

    • Freely available online through the Genome Research Open Access option.

    • Received July 29, 2024.
    • Accepted April 24, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server