Table 1.

Results on HiFi reads

Data setAssemblerSize (Mb)LG50LG90NG50 (Mb)NGA50 (Mb)Complete (%)Duplicated (%)QV# misasm structural# misasm local
CHM13GNNome30511231111.3111.099.530.7154.244486
Hifiasm3052123287.787.799.550.7055.8623101
HiCanu3297165769.769.799.542.7943.302451
Verkko30301013489.49.499.440.7751.614330
M. musculusGNNome26433814023.019.399.623.3045.407071053
Hifiasm26134015021.118.799.621.9345.677061007
HiCanu26516727111.210.599.562.6543.777811205
Verkko26095420415.914.699.601.9545.727051005
A. thalianaGNNome13951312.412.499.891.0952.0812990
Hifiasm15151312.412.499.901.0744.5234256
HiCanu1526168.68.699.873.2040.3010652
Verkko15861810.310.399.871.0439.7522954
G. gallusGNNome11143113510.810.195.792.9949.3524348391
Hifiasm10872712311.511.496.142.0351.0821647231
Verkko1041834103.83.795.441.0849.6513403819

[i] The best-achieved results are in bold. Size is the total length of the assembly. The lengths of the references are 3054 Mb, 2728 Mb, 133 Mb, and 1053 Mb for CHM13 (v1.1), M. musculus (GRCm39), A. thaliana (Col-XJTU), and G. gallus (bGalGal1 maternal), respectively. The LG50 (LG90) measure is the smallest number of contigs that together cover 50% (90%) of the genome. NG50 and NGA50 were computed with minigraph (Li et al. 2020). “Complete” gives the percentage of the reference single-copy genes that are found in the assembly genome, while “duplicated” gives the percentage of reference single-copy genes that are aligned to multiple positions in the assembly. Both “complete” and “duplicated” were computed with compleasm (Huang and Li 2023). Quality value (QV) is per-base consensus accuracy, computed with yak by comparing k-mers in contigs to k-mers found in short reads (Cheng et al. 2021). Short reads were not available for G. gallus, so we computed QV with PacBio HiFi reads instead. Number of structural and local misassemblies (# misasm) was computed with QUAST (Mikheenko et al. 2018). Full QUAST report for HiFi data is given in Supplemental Table S1.