CGC1, a new reference genome for Caenorhabditis elegans

Kazuki Ichikawa; Massa J. Shoura; Karen L. Artiles; Dae-Eun Jeong; Chie Owa; Haruka Kobayashi; Yoshihiko Suzuki; Manami Kanamori; Yu Toyoshima; Yuichi Iino; Ann E. Rougvie; Lamia Wahba; Andrew Z. Fire; Erich M. Schwarz; Shinichi Morishita

doi:10.1101/gr.280274.124

CGC1, a new reference genome for Caenorhabditis elegans

¹Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan;
²Department of Pathology, Stanford University, Stanford, California 94305, USA;
³Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan;
⁴Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota 55454, USA;
⁵Laboratory of Non-Canonical Modes of Inheritance, Rockefeller University, New York, New York 10065, USA;
⁶Department of Genetics, Stanford University, Stanford, California 94305, USA;
⁷Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA

↵8 Present address: Phinomics, Incorporated, San Carlos, CA 94070, USA

Corresponding authors: afire{at}stanford.edu, ems394{at}cornell.edu, moris{at}edu.k.u-tokyo.ac.jp

Abstract

The original 100.3 Mb reference genome for Caenorhabditis elegans, generated from the wild-type laboratory strain N2, has been crucial for analysis of C. elegans since 1998 and has been considered complete since 2005. Unexpectedly, this long-standing reference was shown to be incomplete in 2019 by a genome assembly from the N2-derived strain VC2010. Moreover, genetically divergent versions of N2 have arisen over decades of research and hindered reproducibility of C. elegans genetics and genomics. Here we provide a 106.4 Mb gap-free, telomere-to-telomere genome assembly of C. elegans, generated from CGC1, an isogenic derivative of the N2 strain. We use improved long-read sequencing and manual assembly of 43 recalcitrant genomic regions to overcome deficiencies of prior N2 and VC2010 assemblies and to assemble tandem repeat loci, including a 772 kb sequence for the 45S rRNA genes. Although many differences from earlier assemblies come from repeat regions, unique additions to the genome are also found. Of 19,972 protein-coding genes in the N2 assembly, 19,790 (99.1%) encode products that are unchanged in the CGC1 assembly. The CGC1 assembly also may encode 183 new protein-coding and 163 new ncRNA genes. CGC1 thus provides both a completely defined reference genome and corresponding isogenic wild-type strain for C. elegans, allowing unique opportunities for model and systems biology.

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280274.124.
Freely available online through the Genome Research Open Access option.

Received December 5, 2024.
Accepted June 6, 2025.

This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

CGC1, a new reference genome for Caenorhabditis elegans

Abstract

Footnotes

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Share

Preprint Server

Current Issue

In This Issue