Table 1.

Assemblies of the reference human genome HG002

AsmContig NG50 (Mb)Scaffold NG50 (Mb)Contig NGA50 (Mb)Hamming error (%)QVDup geneMissing geneT2T ctgsT2T scfs
Downsampled (50× Duplex + 30× ONT UL)
Verkko + Illumina trio103.00135.2157.870.7555.772002921627/46
Verkko + Pore-C86.69136.0051.990.7555.722323611326/46
Full-coverage (70× Duplex)
Verkko + Illumina trio59.40133.4839.410.7057.00296309123/46
Verkko + Pore-C43.16113.5931.060.7756.49290310417/46
HiFi (43× + 30× ONT UL) (Cheng et al. 2024)
Verkko + Illumina trio101.76121.2169.190.1759.33206314816/46
hifiasm + Illumina trio101.21N/A60.490.2060.371822877N/A/46

[i] Contig NG50: The length of the shortest contig such that half of the genome is in contigs of this length or greater. No gaps are allowed and sequences are split where a gap of at least three Ns is present. The genome size is defined as 6.08 Gbps based on the reference HG002 assembly (https://github.com/marbl/HG002/blob/main/README.md). Scaffold NG50: same as contig NG50 without splitting at gaps. Hifiasm assemblies from Cheng et al. (2024) do not include scaffolds so we use N/A to denote this in the scaffold NG50 column. Contig NGA50: The length of the shortest alignment such that half of the genome is in contigs of this length or greater. Calculated using Q100 (https://github.com/nhansen/q100bench) versus HG002 v1.0.1. Hamming error: The haplotype error rate computed using yak (Liao et al. 2023) and parent short-read sequence databases measuring the consistency of each scaffold with a single haplotype, lower is better. QV: the Phred (Ewing and Green 1998) log-scaled quality score calculated using Merqury (Rhie et al. 2020), higher is better. Dup/Missing Gene: duplicated or missing genes computed using compleasm (Huang and Li 2023) using the OrthoDB v10 (Waterhouse et al. 2018; Zdobnov et al. 2021) primate database, lower is better. Each haplotype was measured independently and the missing and duplicated genes reported are the sum of both haplotypes. Since single-copy genes from Chromosome X are expected to be missing on the paternal haplotype and some genes may be true duplications, we also measured gene completeness on the HG002 v1.1 assembly (https://github.com/marbl/HG002/blob/main/README.md) (Supplemental Table 2) as a baseline. This assembly has 178 duplicated and 288 missing genes and a hamming error rate of 0.10%. T2T ctgs: The count of telomere-to-telomere contigs for each assembly. A contig is defined as T2T if it has the canonical (TTAGGG) telomere sequence within 10 kbp of the start and end and has no gaps, higher is better. T2T scfs: same as T2T ctgs but gaps are allowed, higher is better. Bold values denote the best result for each metric and sequencing combination.