Table 1.

Distribution of Knowns and Unknowns for EST Assemblies

Organism Number of ESTs[ii] Number of assemblies Number of known genes[iii] (%) Number of putative genes[iv](%) Number matching ProDom[v] (%) Number matching CDD[vi](%) Number of known– unknowns[vii] (%) Number unique[viii](%)
Taxoplasma gondii 234201058572 (0.68)1695 (16.01)1633 (15.43)1075 (10.16)116 (1.10)7829 (73.92)
Neospora caninum 312113882 (0.1)223 (16.1)238 (17.1)167 (12.0)9 (0.6)1022 (73.63)
Sarcocystis neurona 494914450 219 (15.2)215 (14.9)171 (11.8)16 (1.1)1091 (75.50)
Eimeria tenella 1367934258 (0.2)592 (17.3)529 (15.4)453 (13.2)54 (1.6)2272 (66.33)
Plasmodium falciparum 100235800243 (4.19)886 (15.3)1339 (23.09)738 (12.7)271 (4.67)2989 (51.53)

[i] The same number of significant digits were kept in the percentages shown for each entry.

[ii] The number of ESTs for each organism reflects the content of dbEST/NCBI as of March 2002.

[iii] 98% identity to a protein from the same species in SwissProt/PIR.

[iv] p < 10−9 similarity to a protein in SwissProt/PIR but excluding those assemblies in “known” and “known unknowns.”

[v] p < 10−9 similar to ProDom.

[vi] p < 10−9 similar to the Conserved Domain Database (CDD).

[vii] p < 10−9 similarity to a protein in SwissProt/PIR, but the best hit has description “hypothetical protein” or “unknown protein.”

[viii] No similarity found with p < 10−5.