Verkko2 integrates proximity ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding

  1. Sergey Koren1,4
  1. 1 NHGRI, NIH;
  2. 2 Institute for Molecular Medicine Finland, University of Helsinki;
  3. 3 Oxford Nanopore Technologies
  • * Corresponding author; email: sergey.koren{at}nih.gov
  • Abstract

    The Telomere-to-Telomere Consortium recently finished the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on the semi-manual combination of long, accurate PacBio HiFi and ultra-long Oxford Nanopore sequencing reads. The Verkko assembler later automated this process, achieving complete assemblies for approximately half of the chromosomes in a diploid human genome. However, the first version of Verkko was computationally expensive and could not resolve all regions of a typical human genome. Here we present Verkko2, which implements a more efficient read correction algorithm, improves repeat resolution and gap closing, introduces proximity-ligation-based haplotype phasing and scaffolding, and adds support for multiple long-read data types. These enhancements allow Verkko to assemble all regions of a diploid human genome, including the short arms of the acrocentric chromosomes and both sex chromosomes. Together, these changes increase the number of telomere-to-telomere scaffolds by twofold, reduce runtime by fourfold, and improve assembly correctness. On a panel of 19 human genomes, Verkko2 assembles an average of 39 of 46 complete chromosomes as scaffolds, with 21 of these assembled as gapless contigs. Together, these improvements enable telomere-to-telomere comparative genomics and pangenomics, at scale.

    • Received December 20, 2024.
    • Accepted May 12, 2025.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.280383.124 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server