Haplotype-aware sequence alignment to pangenome graphs
Abstract
Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing
alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations
for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to
spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop
novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation
models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination
penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in
time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an
algorithm significantly faster than
is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm
that runs in
time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than
is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning
sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate
that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279143.124.
- Received February 15, 2024.
- Accepted June 24, 2024.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











