Geometric deep learning framework for de novo genome assembly

Table 1.

Results on HiFi reads

Data set Assembler Size (Mb) LG50 LG90 NG50 (Mb) NGA50 (Mb) Complete (%) Duplicated (%) QV # misasm structural # misasm local
CHM13 GNNome 3051 12 31 111.3 111.0 99.53 0.71 54.24 44 86
Hifiasm 3052 12 32 87.7 87.7 99.55 0.70 55.86 23 101
HiCanu 3297 16 57 69.7 69.7 99.54 2.79 43.30 24 51
Verkko 3030 101 348 9.4 9.4 99.44 0.77 51.61 43 30
M. musculus GNNome 2643 38 140 23.0 19.3 99.62 3.30 45.40 707 1053
Hifiasm 2613 40 150 21.1 18.7 99.62 1.93 45.67 706 1007
HiCanu 2651 67 271 11.2 10.5 99.56 2.65 43.77 781 1205
Verkko 2609 54 204 15.9 14.6 99.60 1.95 45.72 705 1005
A. thaliana GNNome 139 5 13 12.4 12.4 99.89 1.09 52.08 129 90
Hifiasm 151 5 13 12.4 12.4 99.90 1.07 44.52 342 56
HiCanu 152 6 16 8.6 8.6 99.87 3.20 40.30 106 52
Verkko 158 6 18 10.3 10.3 99.87 1.04 39.75 229 54
G. gallus GNNome 1114 31 135 10.8 10.1 95.79 2.99 49.35 2434 8391
Hifiasm 1087 27 123 11.5 11.4 96.14 2.03 51.08 2164 7231
Verkko 1041 83 410 3.8 3.7 95.44 1.08 49.65 1340 3819
  • The best-achieved results are in bold. Size is the total length of the assembly. The lengths of the references are 3054 Mb, 2728 Mb, 133 Mb, and 1053 Mb for CHM13 (v1.1), M. musculus (GRCm39), A. thaliana (Col-XJTU), and G. gallus (bGalGal1 maternal), respectively. The LG50 (LG90) measure is the smallest number of contigs that together cover 50% (90%) of the genome. NG50 and NGA50 were computed with minigraph (Li et al. 2020). “Complete” gives the percentage of the reference single-copy genes that are found in the assembly genome, while “duplicated” gives the percentage of reference single-copy genes that are aligned to multiple positions in the assembly. Both “complete” and “duplicated” were computed with compleasm (Huang and Li 2023). Quality value (QV) is per-base consensus accuracy, computed with yak by comparing k-mers in contigs to k-mers found in short reads (Cheng et al. 2021). Short reads were not available for G. gallus, so we computed QV with PacBio HiFi reads instead. Number of structural and local misassemblies (# misasm) was computed with QUAST (Mikheenko et al. 2018). Full QUAST report for HiFi data is given in Supplemental Table S1.

This Article

  1. Genome Res. 35: 839-849

Preprint Server