Strain-level metagenomic profiling using pangenome graphs with PanTax

  1. Xiao Luo1,2
  1. 1Hunan Research Center of the Basic Discipline for Cell Signaling, Hunan University, Changsha, Hunan 410082, China;
  2. 2College of Biology, Hunan University, Changsha, Hunan 410082, China;
  3. 3College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China;
  4. 4Faculty of Technology, Bielefeld University, Bielefeld 33615, Germany
  1. 5 These authors contributed equally to this work.

  • Corresponding authors: aschoen{at}cebitec.uni-bielefeld.de, xluo{at}hnu.edu.cn
  • Abstract

    Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280858.125.

    • Freely available online through the Genome Research Open Access option.

    • Received April 29, 2025.
    • Accepted November 5, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    This article has not yet been cited by other articles.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server