Gapless assembly of complete human and plant chromosomes using only nanopore sequencing

Table 2.

Duplex + ultra-long curated assembly statistics for S. lycopersicum and Z. mays compared to existing reference genomes

Asm Total BP (Mbp) Contigs Contig NG50 (Mb) LAI Gaps QV Errors T2T ctgs
Solanum lycopersicum Heinz 1706
Reference SL5.0 801.78 73 41.70 15.80 60 60.77 14 0/12
Verkko + curation 814.61 20 68.51 15.89 2 51.81 7 11/12
Zea mays B73
Reference Zm5.0 2178.29 1393 47.04 29.12 708 52.18 93 0/10
Verkko + curation 2192.15 26 209.62 30.35 9 60.55 26 6/10
  • Total BP: the total length of assembly bases, in megabases. Contigs: number of sequences in the assembly, after splitting at gaps consisting of at least three Ns. Contig NG50: The length of the shortest contig such that half of the genome is in contigs of this length or greater. LAI: The LTR assembly index (Ou et al. 2018) for each assembly, higher is better. Gaps: the total number of gaps (composed of at least three Ns) in the assembly, lower is better. QV: the Phred (Ewing and Green 1998) log-scaled quality score calculated using Merqury (Rhie et al. 2020), higher is better. Errors: estimate of assembly errors based on VerityMap alignments and discordant k-mers (Mikheenko et al. 2020), lower is better. T2T ctgs: The count of telomere-to-telomere contigs for each assembly. A contig is defined as T2T if it has the canonical (TTTAGGG) telomere sequence within 10 kbp of the start and end and has no gaps, higher is better. Bold denotes the best result for each metric and species.

This Article

  1. Genome Res. 34: 1919-1930

Preprint Server