Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes

  1. Weihua Pan1
  1. 1Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China;
  2. 2Fruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350002, China;
  3. 3Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA;
  4. 4School of Agricultural Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China;
  5. 5College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
  1. 6 These authors contributed equally to this work.

  • Corresponding authors: panweihua{at}caas.cn, stelo{at}ucr.edu, weixiuqing47{at}foxmail.com
  • Abstract

    Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.

    Footnotes

    • Received June 29, 2023.
    • Accepted January 23, 2024.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    | Table of Contents

    Preprint Server