How to Interpret an Anonymous Bacterial Genome: Machine Learning Approach to Gene Identification

Table 7.

Characteristics of Gene Prediction Accuracy for the E. coliGenome with 4289 Annotated Genes

Models AT cluster type Sn Sp Average (Sn, Sp) No. of correctly predicted genes No. of predictions
GB 0.908 0.978 0.943 3895 3981
R 0.851 0.988 0.919 3651 3696
AT 2A 0.925 (0.923) 0.982 (0.983) 0.954 3968 4039
AT,T 2A 0.931 (0.929) 0.981 (0.982) 0.956 3995 4073
R,AT,T 2A 0.932 0.978 0.955 3999 4087
T 2A 0.835 (0.834) 0.991 (0.991) 0.913 3580 3614
AT 2B 0.916 (0.911) 0.985 (0.985) 0.951 3930 3991
AT,T 2B 0.922 (0.917) 0.983 (0.983) 0.953 3954 4023
R,AT,T 2B 0.922 0.983 0.953 3954 4023
T 2B 0.859 (0.857) 0.990 (0.990) 0.924 3683 3719
AT 2C 0.924 (0.924) 0.980 (0.981) 0.952 3964 4043
AT,T 2C 0.939 (0.938) 0.979 (0.979) 0.959 4027 4113
R,AT,T 2C 0.939 0.979 0.959 4027 4113
T 2C 0.848 (0.845) 0.990 (0.990) 0.919 3639 3674
AT 3A 0.927 (0.924) 0.982 (0.983) 0.955 3975 4048
AT,HT,T 3A 0.936 0.978 0.957 4014 4103
AT,T 3A 0.934 (0.930) 0.980 (0.981) 0.957 4004 4085
HT 3A 0.716 (0.715) 0.989 (0.991) 0.853 3070 3103
R,AT,T 3A 0.934 0.980 0.957 4004 4085
R,AT,T,HT 3A 0.936 0.978 0.957 4014 4103
T 3A 0.852 (0.850) 0.990 (0.990) 0.921 3653 3689
AT 3B 0.922 (0.916) 0.984 (0.984) 0.953 3956 4022
AT,HT,T 3B 0.931 0.980 0.956 3993 4076
AT,T 3B 0.929 (0.922) 0.982 (0.982) 0.956 3983 4058
HT 3B 0.713 (0.715) 0.990 (0.991) 0.851 3057 3089
R,AT,T 3B 0.929 0.982 0.956 3983 4058
R,AT,T,HT 3B 0.931 0.980 0.956 3993 4076
T 3B 0.872 (0.869) 0.989 (0.989) 0.930 3741 3781
AT 3C 0.926 (0.922) 0.980 (0.980) 0.953 3971 4052
AT,HT,T 3C 0.943 0.976 0.960 4046 4146
AT,T 3C 0.941 (0.938) 0.977 (0.978) 0.959 4035 4128
HT 3C 0.713 (0.717) 0.990 (0.992) 0.851 3057 3089
R,AT,T 3C 0.941 0.977 0.959 4035 4128
R,AT,T,HT 3C 0.943 0.976 0.960 4046 4146
T 3C 0.863 (0.860) 0.989 (0.990) 0.926 3701 3741
  • The results obtained by using preclustering cross validation are shown in parentheses (italics). Boldface numbers show the maximum postclustering value of Avg(Sn, Sp) for a given species.

This Article

  1. Genome Res. 8: 1154-1171

Preprint Server