Minimizing reference bias with an imputed personalized reference

Table 1.

The steps, inputs, outputs, and tools used in the tested impute-first workflows

Step Input Tool Output
A. Personalization
Read sampling Donor reads: whole-genome DNA-seq reads (Baid et al. 2020) from HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631 seqtk (https://github.com/lh3/seqtk) Reads sampled to 0.01×, 0.05×, 0.1×, 0.2×, 0.5×, 1×, 2×, 5×, 10×, and 20× average coverage
Alignment and genotyping Genotype panels: HGSVC2 (Ebert et al. 2021), HGSVC3 (Logsdon et al. 2025), HPRC_filtered VCF (Ebler 2022), excluding respective samples and family members; reference: GRCh38 primary assembly (Church et al. 2015); reads: output from sampling step Bowtie 2 (Langmead and Salzberg 2012) + BCFtools (Li 2011) Rough genotype calls in VCF format
Imputation Imputation panel and reference: same as previous step; genotype calls: output from genotyping step Beagle (v5.1) (Browning et al. 2018); Glimpse (v1.0.0) (Rubinacci et al. 2021) Personalized reference as phased VCF file
Personalized reference construction Personalized reference: from imputation step; reference: GRCh38 primary assembly BCFtools (bcftools consensus) Personalized reference as diploid FASTA
B. Downstream analysis
B.1. Variation-graph reference
Graph construction and Indexing Personalized reference as phased VCF file: from construction step; reference: GRCh38 primary assembly vg (v1.55.0) autoindex (Garrison et al. 2018) Indexed graph reference
Alignment and Lifting Donor reads; graph reference: from previous step vg (v1.55.0) surject (Sirén et al. 2021) Aligned reads
Variant calling and Evaluation Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB (Zook et al. 2016) high-confidence region annotations, etc. DeepVariant v1.5.0 (Poplin et al. 2018); hap.py v0.3.15 (The Global Alliance for Genomics and Health Benchmarking Team et al. 2019) Variant calls as VCF; benchmarking metrics
B.2. Multi-linear-haplotype reference
Indexing Personalized reference: From construction step; T2T-CHM13v1.0 genome assembly (Nurk et al. 2022) bwa index (Li 2013) Indexed reference
Alignment and Lifting Donor reads HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631; indexed reference: from previous step bwa mem (Li 2013) and levioSAM2 lift and levioSAM2 reconcile (Chen et al. 2024) Aligned reads
Variant calling and Evaluation Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB high-confidence region annotations, etc. DeepVariant v1.5.0; hap.py v0.3.15 Variant calls as VCF; benchmarking metrics

This Article

  1. Genome Res. 36: 740-753

Preprint Server