TY  - JOUR
A1  - Vaddadi, Kavya
A1  - Lin, Mao-Jan
A1  - Majidian, Sina
A1  - Mun, Taher
A1  - Langmead, Ben
T1  - Minimizing reference bias with an imputed personalized reference
Y1  - 2026/03/05 
JF  - Genome Research 
JO  - Genome Research 
DO  - 10.1101/gr.280989.125 
UR  - http://genome.cshlp.org/content/early/2026/03/20/gr.280989.125.abstract 
N2  - Pangenome indexes reduce reference bias in sequencing data analysis. However, bias can be reduced further by using a personalized reference, for example, a diploid human reference constructed to match a donor individual's alleles. Here, we present a new impute-first alignment framework that combines elements of genotype imputation and read alignment. We first genotype the individual using a subsample of the input reads. Using a reference panel and an efficient imputation algorithm, we impute a personalized diploid reference. Finally, we index the personalized reference and apply a read aligner (either linear or graph) to align the full read set to the personalized reference. On the HG002 sample, this framework achieves a higher variant-calling F1-score (99.77%) compared with the traditional linear aligner (99.62%), graph pangenome aligner (99.72%), and graph personalized-pangenome aligner (99.75%), with substantial reduction in the number of errors (38.73% vs. a linear aligner, 14.97% vs. a graph aligner, and 6.05% vs. a personalized graph). An imputed reference can have comparable efficiency to a pangenome reference, making it an overall advantageous choice for whole-genome DNA sequencing experiments. Advantages of our impute-first approach include that (1) it fully considers linkage disequilibrium and produces a phased diploid reference as an output; (2) it produces accurate personalized references even from low-coverage data; and (3) it is compatible with both graph and linear reference representations, achieving its highest variant-calling F1 accuracy using a standard linear aligner (BWA-MEM). 
ER  -