Kavya Vaddadi; Mao-Jan Lin; Sina Majidian; Taher Mun; Ben Langmead

Table 1.

The steps, inputs, outputs, and tools used in the tested impute-first workflows

Step	Input	Tool	Output
A. Personalization
Read sampling	Donor reads: whole-genome DNA-seq reads (Baid et al. 2020) from HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631	seqtk (https://github.com/lh3/seqtk)	Reads sampled to 0.01×, 0.05×, 0.1×, 0.2×, 0.5×, 1×, 2×, 5×, 10×, and 20× average coverage
Alignment and genotyping	Genotype panels: HGSVC2 (Ebert et al. 2021), HGSVC3 (Logsdon et al. 2025), HPRC_filtered VCF (Ebler 2022), excluding respective samples and family members; reference: GRCh38 primary assembly (Church et al. 2015); reads: output from sampling step	Bowtie 2 (Langmead and Salzberg 2012) + BCFtools (Li 2011)	Rough genotype calls in VCF format
Imputation	Imputation panel and reference: same as previous step; genotype calls: output from genotyping step	Beagle (v5.1) (Browning et al. 2018); Glimpse (v1.0.0) (Rubinacci et al. 2021)	Personalized reference as phased VCF file
Personalized reference construction	Personalized reference: from imputation step; reference: GRCh38 primary assembly	BCFtools (bcftools consensus)	Personalized reference as diploid FASTA
B. Downstream analysis
B.1. Variation-graph reference
Graph construction and Indexing	Personalized reference as phased VCF file: from construction step; reference: GRCh38 primary assembly	vg (v1.55.0) autoindex (Garrison et al. 2018)	Indexed graph reference
Alignment and Lifting	Donor reads; graph reference: from previous step	vg (v1.55.0) surject (Sirén et al. 2021)	Aligned reads
Variant calling and Evaluation	Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB (Zook et al. 2016) high-confidence region annotations, etc.	DeepVariant v1.5.0 (Poplin et al. 2018); hap.py v0.3.15 (The Global Alliance for Genomics and Health Benchmarking Team et al. 2019)	Variant calls as VCF; benchmarking metrics
B.2. Multi-linear-haplotype reference
Indexing	Personalized reference: From construction step; T2T-CHM13v1.0 genome assembly (Nurk et al. 2022)	bwa index (Li 2013)	Indexed reference
Alignment and Lifting	Donor reads HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631; indexed reference: from previous step	bwa mem (Li 2013) and levioSAM2 lift and levioSAM2 reconcile (Chen et al. 2024)	Aligned reads
Variant calling and Evaluation	Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB high-confidence region annotations, etc.	DeepVariant v1.5.0; hap.py v0.3.15	Variant calls as VCF; benchmarking metrics

Minimizing reference bias with an imputed personalized reference

This Article

Preprint Server

Current Issue

In This Issue