RT Journal
A1 Zhang, Wenhai
A1 Liu, Yuansheng
A1 Li, Guangyi
A1 Xu, Jialu
A1 Chen, Enlian
A1 Schönhuth, Alexander
A1 Luo, Xiao
T1 Strain-level metagenomic profiling using pangenome graphs with PanTax
JF Genome Research 
JO Genome Research 
YR 2026 
FD February 01 
VO 36 
IS 2 
SP 405 
OP 420 
DO 10.1101/gr.280858.125 
UL http://genome.cshlp.org/content/36/2/405.abstract 
AB Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.