Distribution of Knowns and Unknowns for EST Assemblies
| Organism | Number of ESTs[ii] | Number of assemblies | Number of known genes[iii] (%) | Number of putative genes[iv](%) | Number matching ProDom[v] (%) | Number matching CDD[vi](%) | Number of known– unknowns[vii] (%) | Number unique[viii](%) |
| Taxoplasma gondii | 23420 | 10585 | 72 (0.68) | 1695 (16.01) | 1633 (15.43) | 1075 (10.16) | 116 (1.10) | 7829 (73.92) |
| Neospora caninum | 3121 | 1388 | 2 (0.1) | 223 (16.1) | 238 (17.1) | 167 (12.0) | 9 (0.6) | 1022 (73.63) |
| Sarcocystis neurona | 4949 | 1445 | 0 | 219 (15.2) | 215 (14.9) | 171 (11.8) | 16 (1.1) | 1091 (75.50) |
| Eimeria tenella | 13679 | 3425 | 8 (0.2) | 592 (17.3) | 529 (15.4) | 453 (13.2) | 54 (1.6) | 2272 (66.33) |
| Plasmodium falciparum | 10023 | 5800 | 243 (4.19) | 886 (15.3) | 1339 (23.09) | 738 (12.7) | 271 (4.67) | 2989 (51.53) |
[i] The same number of significant digits were kept in the percentages shown for each entry.
[ii] The number of ESTs for each organism reflects the content of dbEST/NCBI as of March 2002.
[iii] 98% identity to a protein from the same species in SwissProt/PIR.
[iv] p < 10−9 similarity to a protein in SwissProt/PIR but excluding those assemblies in “known” and “known unknowns.”
[v] p < 10−9 similar to ProDom.
[vi] p < 10−9 similar to the Conserved Domain Database (CDD).
[vii] p < 10−9 similarity to a protein in SwissProt/PIR, but the best hit has description “hypothetical protein” or “unknown protein.”
[viii] No similarity found with p < 10−5.