Minimizing reference bias with an imputed personalized reference

  • * Corresponding author; email: langmea{at}cs.jhu.edu
  • Abstract

    Pangenome indexes reduce reference bias in sequencing data analysis. However, bias can be reduced further by using a personalized reference, e.g. a diploid human reference constructed to match a donor individual's alleles. We present a new impute-first alignment framework that combines elements of genotype imputation and read alignment. We first genotype the individual using a subsample of the input reads. Using a reference panel and an efficient imputation algorithm, we impute a personalized diploid reference. Finally, we index the personalized reference and apply a read aligner (either linear or graph) to align the full read set to the personalized reference. On the HG002 sample, this framework achieves a higher variant-calling F1 score (99.77%) compared to the traditional linear aligner (99.62%) graph pangenome aligner (99.72%), and graph personalized-pangenome aligner (99.75%), with substantial reduction in the number of errors (38.73% compared to a linear aligner, 14.97% to a graph aligner, and 6.05% compared to a personalized graph). An imputed reference can have comparable efficiency to a pangenome reference, making it an overall advantageous choice for whole-genome DNA sequencing experiments. Advantages of our impute-first approach include that it (a) fully considers linkage disequilibrium and produces a phased diploid reference as an output, (b) produces accurate personalized references even from low-coverage data, (c) is compatible with both graph and linear reference representations and, achieving its highest variant-calling F1 accuracy using a standard linear aligner (BWA-MEM).

    • Received May 29, 2025.
    • Accepted March 2, 2026.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.280989.125 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server