Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes
- Wenjuan Yu1,6,
- Haohui Luo1,6,
- Jinbao Yang1,5,6,
- Shengchen Zhang1,5,6,
- Heling Jiang1,6,
- Xianjia Zhao1,4,
- Xingqi Hui1,4,
- Da Sun1,
- Liang Li2,
- Xiu-qing Wei2,
- Stefano Lonardi3 and
- Weihua Pan1
- 1Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China;
- 2Fruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350002, China;
- 3Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA;
- 4School of Agricultural Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China;
- 5College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
-
↵6 These authors contributed equally to this work.
Abstract
Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278232.123.
- Received June 29, 2023.
- Accepted January 23, 2024.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











