Assemblies of the reference human genome HG002
| Asm | Contig NG50 (Mb) | Scaffold NG50 (Mb) | Contig NGA50 (Mb) | Hamming error (%) | QV | Dup gene | Missing gene | T2T ctgs | T2T scfs |
|---|---|---|---|---|---|---|---|---|---|
| Downsampled (50× Duplex + 30× ONT UL) | |||||||||
| Verkko + Illumina trio | 103.00 | 135.21 | 57.87 | 0.75 | 55.77 | 200 | 292 | 16 | 27/46 |
| Verkko + Pore-C | 86.69 | 136.00 | 51.99 | 0.75 | 55.72 | 232 | 361 | 13 | 26/46 |
| Full-coverage (70× Duplex) | |||||||||
| Verkko + Illumina trio | 59.40 | 133.48 | 39.41 | 0.70 | 57.00 | 296 | 309 | 1 | 23/46 |
| Verkko + Pore-C | 43.16 | 113.59 | 31.06 | 0.77 | 56.49 | 290 | 310 | 4 | 17/46 |
| HiFi (43× + 30× ONT UL) (Cheng et al. 2024) | |||||||||
| Verkko + Illumina trio | 101.76 | 121.21 | 69.19 | 0.17 | 59.33 | 206 | 314 | 8 | 16/46 |
| hifiasm + Illumina trio | 101.21 | N/A | 60.49 | 0.20 | 60.37 | 182 | 287 | 7 | N/A/46 |
-
Contig NG50: The length of the shortest contig such that half of the genome is in contigs of this length or greater. No gaps are allowed and sequences are split where a gap of at least three Ns is present. The genome size is defined as 6.08 Gbps based on the reference HG002 assembly (https://github.com/marbl/HG002/blob/main/README.md). Scaffold NG50: same as contig NG50 without splitting at gaps. Hifiasm assemblies from Cheng et al. (2024) do not include scaffolds so we use N/A to denote this in the scaffold NG50 column. Contig NGA50: The length of the shortest alignment such that half of the genome is in contigs of this length or greater. Calculated using Q100 (https://github.com/nhansen/q100bench) versus HG002 v1.0.1. Hamming error: The haplotype error rate computed using yak (Liao et al. 2023) and parent short-read sequence databases measuring the consistency of each scaffold with a single haplotype, lower is better. QV: the Phred (Ewing and Green 1998) log-scaled quality score calculated using Merqury (Rhie et al. 2020), higher is better. Dup/Missing Gene: duplicated or missing genes computed using compleasm (Huang and Li 2023) using the OrthoDB v10 (Waterhouse et al. 2018; Zdobnov et al. 2021) primate database, lower is better. Each haplotype was measured independently and the missing and duplicated genes reported are the sum of both haplotypes. Since single-copy genes from Chromosome X are expected to be missing on the paternal haplotype and some genes may be true duplications, we also measured gene completeness on the HG002 v1.1 assembly (https://github.com/marbl/HG002/blob/main/README.md) (Supplemental Table 2) as a baseline. This assembly has 178 duplicated and 288 missing genes and a hamming error rate of 0.10%. T2T ctgs: The count of telomere-to-telomere contigs for each assembly. A contig is defined as T2T if it has the canonical (TTAGGG) telomere sequence within 10 kbp of the start and end and has no gaps, higher is better. T2T scfs: same as T2T ctgs but gaps are allowed, higher is better. Bold values denote the best result for each metric and sequencing combination.











