Verkko2 integrates proximity-ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding

Table 1.

Verkko2 Hi-C versus trio benchmarking

Sheep Chicken HG002 HG00733
T2T scf
 Hi-C 31 34 40 41
 Trio 23 32 32 33
T2T ctg
 Hi-C 24 21 21 26
 Trio 20 25 22 23
Hamming error
 Hi-C 0.85% 0.58% 0.39% 0.75%
 Trio 0.88% 0.60% 0.39% 0.77%
Switch error
 Hi-C 0.95% 0.13% 0.41% 0.79%
 Trio 0.95% 0.13% 0.41% 0.79%
QV
 Hi-C 54.17 45.13 53.87 53.86
 Trio 54.17 45.17 53.89 53.82
Missing genes
 Hi-C 1.37% 3.12% 1.61% 0.09%
 Trio 1.36% 2.52% 1.60% 0.09%
Missing genes (no sex chrs)
 Hi-C 0.06% 1.06% 0.09% 0.09%
 Trio 0.06% 0.44% 0.09% 0.09%
Dup genes
 Hi-C 1.73% 0.12% 0.68% 0.71%
 Trio 1.72% 0.22% 0.69% 0.72%
  • T2T contigs/scaffolds were counted as those >5 Mb with telomeres on both ends. Scaffolds were broken into contigs at any stretch of more than three N's, and contigs/scaffolds <100 kb were discarded for all metrics. Verkko Hi-C shows comparable QV, switch, hamming, and missing gene stats to Verkko trio, but it consistently has a higher count of T2T scaffolds owing to its ability to restore missing connectivity using Hi-C links. Bold text indicates the best score for each metric and species; ties are also bolded.

This Article

  1. Genome Res. 35: 1583-1594

Preprint Server