Table 1.
The steps, inputs, outputs, and tools used in the tested impute-first workflows
| Step | Input | Tool | Output |
|---|---|---|---|
| A. Personalization | |||
| Read sampling | Donor reads: whole-genome DNA-seq reads (Baid et al. 2020) from HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631 | seqtk (https://github.com/lh3/seqtk) | Reads sampled to 0.01×, 0.05×, 0.1×, 0.2×, 0.5×, 1×, 2×, 5×, 10×, and 20× average coverage |
| Alignment and genotyping | Genotype panels: HGSVC2 (Ebert et al. 2021), HGSVC3 (Logsdon et al. 2025), HPRC_filtered VCF (Ebler 2022), excluding respective samples and family members; reference: GRCh38 primary assembly (Church et al. 2015); reads: output from sampling step | Bowtie 2 (Langmead and Salzberg 2012) + BCFtools (Li 2011) | Rough genotype calls in VCF format |
| Imputation | Imputation panel and reference: same as previous step; genotype calls: output from genotyping step | Beagle (v5.1) (Browning et al. 2018); Glimpse (v1.0.0) (Rubinacci et al. 2021) | Personalized reference as phased VCF file |
| Personalized reference construction | Personalized reference: from imputation step; reference: GRCh38 primary assembly | BCFtools (bcftools consensus) | Personalized reference as diploid FASTA |
| B. Downstream analysis | |||
| B.1. Variation-graph reference | |||
| Graph construction and Indexing | Personalized reference as phased VCF file: from construction step; reference: GRCh38 primary assembly | vg (v1.55.0) autoindex (Garrison et al. 2018) | Indexed graph reference |
| Alignment and Lifting | Donor reads; graph reference: from previous step | vg (v1.55.0) surject (Sirén et al. 2021) | Aligned reads |
| Variant calling and Evaluation | Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB (Zook et al. 2016) high-confidence region annotations, etc. | DeepVariant v1.5.0 (Poplin et al. 2018); hap.py v0.3.15 (The Global Alliance for Genomics and Health Benchmarking Team et al. 2019) | Variant calls as VCF; benchmarking metrics |
| B.2. Multi-linear-haplotype reference | |||
| Indexing | Personalized reference: From construction step; T2T-CHM13v1.0 genome assembly (Nurk et al. 2022) | bwa index (Li 2013) | Indexed reference |
| Alignment and Lifting | Donor reads HG001/NA12878, HG002/NA24385, HG003/NA24149, HG004/NA24143, and HG005/NA24631; indexed reference: from previous step | bwa mem (Li 2013) and levioSAM2 lift and levioSAM2 reconcile (Chen et al. 2024) | Aligned reads |
| Variant calling and Evaluation | Aligned reads: from previous step; true variants: HG001, HG002, HG003, HG004, and HG005 VCF from GIAB high-confidence region annotations, etc. | DeepVariant v1.5.0; hap.py v0.3.15 | Variant calls as VCF; benchmarking metrics |











