Details of N2 genes, AUGUSTUS protein-coding genes, and StringTie transcripts
| Gene category | Gene number |
|---|---|
| All protein-coding genes in N2 | 19,972 |
| Encoded protein spectra | 9508 (47.6% of all) |
| Could not be mapped onto CGC1 genome | 75 |
| Could not be mapped; micropeptides | 49 |
| Could not be mapped; micropeptides with StringTie2 matches | 24 |
| Could not be mapped; had AUGUSTUS orthologs | 26 |
| Could not be mapped; had AUGUSTUS orthologs and StringTie2 matches | 14 |
| Could not be mapped; had unique AUGUSTUS orthologs | 15 |
| Could not be mapped; encoded protein spectra | 3 |
| Mapped onto CGC1 genome | 19,897 |
| Mapped; overlapped StringTie2 transcripts | 18,755 (93.8% of mapped) |
| Mapped; overlapped AUGUSTUS genes | 19,427 (97.6% of mapped) |
| Overlapped AUGUSTUS genes with CGC1-specific exons | 46 |
| Could not be translated after mapping | 24 |
| Not translated but overlapped AUGUSTUS genes | 22 (91.7% of 24) |
| Could be translated but only with an altered product | 83 |
| Only altered translation but overlapped AUGUSTUS predictions | 79 (95.2% of 83) |
| Encoded at least one protein identical to N2 | 19,790 (99.1% of all) |
| All ncRNA genes in N2 | 24,788 |
| Could not be mapped onto CGC1 genome | 3 |
| Mapped onto CGC1 genome | 24,785 |
| Mapped; overlapped StringTie2 transcripts | 5173 (20.9% of mapped) |
| Could be transcribed but only with an altered product | 194 |
| Encoded at least one transcript identical to N2 | 24,591 (99.2% of all) |
| All pseudogenes in N2 | 2131 |
| Could not be mapped onto CGC1 genome | 4 |
| Mapped onto CGC1 genome | 2127 |
| Could be transcribed but only with an altered product | 35 |
| Encoded at least one transcript identical to N2 | 2092 (98.2% of all) |
| All AUGUSTUS protein-coding genes in CGC1 genome | 21,238 |
| Encoded protein spectra | 9640 (45.4% of all) |
| Overlapped StringTie2 transcripts | 19,746 (93.0% of all) |
| Overlapped N2 protein-coding genes | 19,232 (90.6% of all) |
| Did not overlap N2 protein-coding genes | 2006 (9.4% of all) |
| No N2 prot.-cod. gene overlap; overlapped only N2 DNA | 1779 |
| Met criteria for new genes; overlapped only N2 DNA | 314 |
| Met criteria; only N2 DNA; overlapped StringTie2 transcripts | 201 (64.0% of 314) |
| Met criteria; only N2 DNA; encoded protein spectra | 12 (3.8% of 314) |
| No N2 prot.-cod. gene overlap; overlapped CGC1-specific DNA | 227 |
| Met criteria for new genes; overlapped CGC1-specific DNA | 183 |
| New genes; CGC1-specific DNA; overlapped StringTie2 transcripts | 150 (82.0% of 183) |
| New genes; CGC1-specific DNA; encoded protein spectra | 13 (7.1% of 183) |
| All genomic loci encoding StringTie2 transcripts in CGC1 | 23,249 |
| Overlapped N2 protein-coding genes | 16,961 (73.0% of all) |
| Overlapped N2 ncRNA genes | 2745 (11.8% of all) |
| Overlapped AUGUSTUS protein-coding genes | 18,059 (77.7% of all) |
| Not overlapping N2 or AUGUSTUS genes | 3868 (16.6% of all) |
| No N2/AUGUSTUS overlap; overlapped only N2 DNA; did not encode any ncRNA motifs from Rfam | 3693 |
| No N2/AUGUSTUS overlap; overlapped CGC1-specific DNA; did not encode any ncRNA motifs from Rfam | 163 |
-
We mapped nuclear (nonmitochondrial) N2 genes from the WS292 release of WormBase (Sternberg et al. 2024) to the CGC1 assembly as three sets: protein-coding, ncRNA, and pseudogenes. Of the 75 unmapped N2 protein-coding genes, 49 encoded micropeptides of nine to 15 amino acids (Olexiouk et al. 2018) that made them difficult to map. The other 26 unmapped N2 genes, which encoded proteins of 100 or more amino acids, all had orthologs among the AUGUSTUS gene predictions; of these, 15 were unique (1:1), whereas 11 were two to 13 AUGUSTUS genes each. We tested all unmapped genes for 100% full-length matches to StringTie2 transcripts in the CGC1 genome.











