Minimizing reference bias with an imputed personalized reference

Table 2.

Downstream workflows assessed

Approach Workflow Description
BWA-MEM BWA-MEM Standard linear reference genome (GRCh38)
VG Giraffe Giraffe(linear) Standard linear reference genome (GRCh38) with no call set
Giraffe(HPRC_pangenome) HPRC v1.1 frequency-filtered (GRCh38) pangenome including 44 sample assemblies + CHM13 haplotype
Giraffe(1 kGP_pangenome) GRCh38 phase3 1 kGP call set pangenome
Giraffe(Imputefirst_c5) Personalized diploid reference using Bowtie 2 + BCFtools + Beagle with HGSVC3 panel at 5× coverage (BBBC5)
Giraffe(Imputefirst_c20) Personalized diploid reference using Bowtie 2 + BCFtools + Beagle with HGSVC3 panel at 20× coverage (BBBC20)
Giraffe(diploid) Personalized pangenome using Giraffe's diploid-sampling
Giraffe(benchmark) Diploid reference created using GIAB truth set
BWA-MEM + LevioSAM2 Leviosam2(Imputefirst_c5) Personalized diploid reference using Bowtie 2 + BCFtools + Beagle with HGSVC3 panel at 5× coverage
Leviosam2(Imputefirst_c20) Personalized diploid reference using Bowtie 2 + BCFtools + Beagle with HGSVC3 panel at 20× coverage
Leviosam2(benchmark) Diploid reference created using GIAB truth set
  • For analyses using the HPRC pangenome and 1 kGP pangenome, the sample under analysis and its family members were excluded, if present. The benchmark represents results using the GIAB ground-truth call set for the respective samples, indicating the best achievable performance with each workflow. Giraffe(diploid_reported) refers to variant-calling performance statistics directly reported by Sirén et al. (2024) for the corresponding samples.

This Article

  1. Genome Res. 36: 740-753

Preprint Server