Table 1.

Statistics characterizing the GeneMarkS-T initial gene predictions made in assembled transcripts as well as the HC genes predicted by the GeneMarkS-TP module

SpeciesNo. of genes initially predicted by GeneMarkS-TSn/Pr of GeneMarkS-T-predicted genesOrder excluded DBSpecies excluded DB
No. of HC genesSn/Pr of HC genesNo. of HC genesSn/Pr of HC genes
C. elegans14,74646.8/63.4806235.7/88.411,39951.7/90.6
A. thaliana17,58951.2/79.916,00856.7/97.316,55158.8/97.6
D. melanogaster10,16359.6/81.8810955.0/94.7922363.7/96.3
S. lycopersicum19,52668.0/78.417,23175.1/95.317,48975.8/95.2
D. rerio22,99259.6/59.916,91867.0/88.516,57366.9/90.4
G. gallus17,38149.6/47.012,47374.4/89.112,56474.0/88.4
M. musculus15,81949.6/63.213,05763.5/93.212,96563.9/94.5

[i] Two versions of a reference protein database were used for each species: the “species excluded” and the “order excluded” (see section “Data sets”). Refinement made by GeneMarkS-TP reduced the number of initial predictions and produced a significant increase in Pr. In most of the genomes, there was also a positive change in Sn, especially in large inhomogeneous genomes of G. gallus and M. musculus. Additional data are provided in Supplemental Table S1.