GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes

Table 3.

Numbers of the predicted and annotated genes as well as individual CDSs (including all alternative CDSs)

Species GeneMark-ETP RefSeq annotation
No. of protein-coding genes No. of predicted genomic CDSs No. of protein-coding genes No. of annotated genomic CDSs
C. elegans 18,820 19,806 19,969 28,544
A. thaliana 26,449 27,708 27,445 40,827
D. melanogaster 12,850 14,138 13,951 22,395
S. lycopersicum 24,420 26,341 25,158 31,911
D. rerio 28,608 31,961 25,610 42,929
G. gallus 17,275 21,433 17,279 38,534
M. musculus 23,956 27,686 22,405 58,318
  • Note that the genomic CDSs and the corresponding transcript CDSs are supposed to be identical in sequence. The “order excluded” reference databases were used by GeneMark-ETP (see section “Data sets”).

This Article

  1. Genome Res. 34: 757-768

Preprint Server