Minimizing reference bias with an imputed personalized reference

  • * Corresponding author; email: langmea{at}cs.jhu.edu
  • Abstract

    Pangenome indexes reduce reference bias in sequencing data analysis. However, bias can be reduced further by using a personalized reference, e.g. a diploid human reference constructed to match a donor individual's alleles. We present a new impute-first alignment framework that combines elements of genotype imputation and read alignment. We first genotype the individual using a subsample of the input reads. Using a reference panel and an efficient imputation algorithm, we impute a personalized diploid reference. Finally, we index the personalized reference and apply a read aligner (either linear or graph) to align the full read set to the personalized reference. On the HG002 sample, this framework achieves a higher variant-calling F1 score (99.77%) compared to the traditional linear aligner (99.62%) graph pangenome aligner (99.72%), and graph personalized-pangenome aligner (99.75%), with substantial reduction in the number of errors (38.73% compared to a linear aligner, 14.97% to a graph aligner, and 6.05% compared to a personalized graph). An imputed reference can have comparable efficiency to a pangenome reference, making it an overall advantageous choice for whole-genome DNA sequencing experiments. Advantages of our impute-first approach include that it (a) fully considers linkage disequilibrium and produces a phased diploid reference as an output, (b) produces accurate personalized references even from low-coverage data, (c) is compatible with both graph and linear reference representations and, achieving its highest variant-calling F1 accuracy using a standard linear aligner (BWA-MEM).

    • Received May 29, 2025.
    • Accepted March 2, 2026.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    This article has not yet been cited by other articles.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.280989.125 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server