Table 3.
Numbers of the predicted and annotated genes as well as individual CDSs (including all alternative CDSs)
| Species | GeneMark-ETP | RefSeq annotation | ||
|---|---|---|---|---|
| No. of protein-coding genes | No. of predicted genomic CDSs | No. of protein-coding genes | No. of annotated genomic CDSs | |
| C. elegans | 18,820 | 19,806 | 19,969 | 28,544 |
| A. thaliana | 26,449 | 27,708 | 27,445 | 40,827 |
| D. melanogaster | 12,850 | 14,138 | 13,951 | 22,395 |
| S. lycopersicum | 24,420 | 26,341 | 25,158 | 31,911 |
| D. rerio | 28,608 | 31,961 | 25,610 | 42,929 |
| G. gallus | 17,275 | 21,433 | 17,279 | 38,534 |
| M. musculus | 23,956 | 27,686 | 22,405 | 58,318 |
-
Note that the genomic CDSs and the corresponding transcript CDSs are supposed to be identical in sequence. The “order excluded” reference databases were used by GeneMark-ETP (see section “Data sets”).











