CGC1, a new reference genome for Caenorhabditis elegans

Table 3.

Details of N2 genes, AUGUSTUS protein-coding genes, and StringTie transcripts

Gene category Gene number
All protein-coding genes in N2 19,972
Encoded protein spectra 9508 (47.6% of all)
Could not be mapped onto CGC1 genome 75
Could not be mapped; micropeptides 49
Could not be mapped; micropeptides with StringTie2 matches 24
Could not be mapped; had AUGUSTUS orthologs 26
Could not be mapped; had AUGUSTUS orthologs and StringTie2 matches 14
Could not be mapped; had unique AUGUSTUS orthologs 15
Could not be mapped; encoded protein spectra 3
Mapped onto CGC1 genome 19,897
Mapped; overlapped StringTie2 transcripts 18,755 (93.8% of mapped)
Mapped; overlapped AUGUSTUS genes 19,427 (97.6% of mapped)
Overlapped AUGUSTUS genes with CGC1-specific exons 46
Could not be translated after mapping 24
Not translated but overlapped AUGUSTUS genes 22 (91.7% of 24)
Could be translated but only with an altered product 83
Only altered translation but overlapped AUGUSTUS predictions 79 (95.2% of 83)
Encoded at least one protein identical to N2 19,790 (99.1% of all)
All ncRNA genes in N2 24,788
Could not be mapped onto CGC1 genome 3
Mapped onto CGC1 genome 24,785
Mapped; overlapped StringTie2 transcripts 5173 (20.9% of mapped)
Could be transcribed but only with an altered product 194
Encoded at least one transcript identical to N2 24,591 (99.2% of all)
All pseudogenes in N2 2131
Could not be mapped onto CGC1 genome 4
Mapped onto CGC1 genome 2127
Could be transcribed but only with an altered product 35
Encoded at least one transcript identical to N2 2092 (98.2% of all)
All AUGUSTUS protein-coding genes in CGC1 genome 21,238
Encoded protein spectra 9640 (45.4% of all)
Overlapped StringTie2 transcripts 19,746 (93.0% of all)
Overlapped N2 protein-coding genes 19,232 (90.6% of all)
Did not overlap N2 protein-coding genes 2006 (9.4% of all)
No N2 prot.-cod. gene overlap; overlapped only N2 DNA 1779
Met criteria for new genes; overlapped only N2 DNA 314
Met criteria; only N2 DNA; overlapped StringTie2 transcripts 201 (64.0% of 314)
Met criteria; only N2 DNA; encoded protein spectra 12 (3.8% of 314)
No N2 prot.-cod. gene overlap; overlapped CGC1-specific DNA 227
Met criteria for new genes; overlapped CGC1-specific DNA 183
New genes; CGC1-specific DNA; overlapped StringTie2 transcripts 150 (82.0% of 183)
New genes; CGC1-specific DNA; encoded protein spectra 13 (7.1% of 183)
All genomic loci encoding StringTie2 transcripts in CGC1 23,249
Overlapped N2 protein-coding genes 16,961 (73.0% of all)
Overlapped N2 ncRNA genes 2745 (11.8% of all)
Overlapped AUGUSTUS protein-coding genes 18,059 (77.7% of all)
Not overlapping N2 or AUGUSTUS genes 3868 (16.6% of all)
No N2/AUGUSTUS overlap; overlapped only N2 DNA; did not encode any ncRNA motifs from Rfam 3693
No N2/AUGUSTUS overlap; overlapped CGC1-specific DNA; did not encode any ncRNA motifs from Rfam 163
  • We mapped nuclear (nonmitochondrial) N2 genes from the WS292 release of WormBase (Sternberg et al. 2024) to the CGC1 assembly as three sets: protein-coding, ncRNA, and pseudogenes. Of the 75 unmapped N2 protein-coding genes, 49 encoded micropeptides of nine to 15 amino acids (Olexiouk et al. 2018) that made them difficult to map. The other 26 unmapped N2 genes, which encoded proteins of 100 or more amino acids, all had orthologs among the AUGUSTUS gene predictions; of these, 15 were unique (1:1), whereas 11 were two to 13 AUGUSTUS genes each. We tested all unmapped genes for 100% full-length matches to StringTie2 transcripts in the CGC1 genome.

This Article

  1. Genome Res. 35: 1902-1918

Preprint Server