How to Interpret an Anonymous Bacterial Genome: Machine Learning Approach to Gene Identification

Table 8.

Gene Prediction Accuracy in Terms of Avg(Sn,Sp) Observed for the 10 Genomes

Species name Type of atypical model (cluster)
2A 2B 2C 3A 3B 3C
A. fulgidus 0.953  (0.952) 0.953  (0.954
B. subtilis 0.949 (0.942) 0.943 (0.936) 0.951  (0.945) 0.950 (0.943) 0.943 (0.933) 0.951  (0.947)
E. coli 0.956 (0.956) 0.953 (0.950) 0.959  (0.958) 0.957 (0.956) 0.956 (0.952) 0.959  (0.058)
H. influenzae 0.955  (0.954) 0.953 (0.953) 0.953 (0.954) 0.955  (0.956) 0.954 (0.954) 0.955  (0.955)
H. pylori 0.950  (0.950) 0.950  (0.951) 0.949 (0.948) 0.948 (0.950) 0.950  (0.950) 0.948 (0.948)
M. genitalium 0.924 (0.931) 0.927 (0.931) 0.924 (0.927) 0.925 (0.929) 0.928  (0.932) 0.922 (0.925)
M. jannaschii 0.974 (0.974) 0.974 (0.974) 0.974 (0.974) 0.975  (0.973) 0.974 (0.974) 0.975  (0.972)
M. pneumoniae 0.925 (0.921) 0.925 (0.927) 0.917 (0.913) 0.923 (0.921) 0.927  (0.927) 0.924 (0.922)
M. thermoautotrophicum 0.972  (0.969) 0.970 (0.970)
Synechocystis 0.968 (0.967) 0.965 (0.966) 0.968 (0.968) 0.968 (0.968) 0.966 (0.966) 0.969  (0.969)
  • The gene-finding program GeneMark included models derived from typical and atypical clusters. The results obtained by using preclustering cross-validation are shown in parenetheses (in italics). All other data were obtained by using postclustering cross-validation. See Table 7 for explanation of boldface numbers.

This Article

  1. Genome Res. 8: 1154-1171

Preprint Server