The Training and Testing Corpus
| Category | GO code | Training | Test 2000 | Test 2001 | PubMed query |
| Autophagy | GO:0006914 | 177 | 22 | 1 | (autophagy [TI] OR autophagocytosis [MAJR]) AND (Proteins) [MH] OR Genes [MH]) AND 1940:1999 [DP] |
| Biogenesis | GO:0016043 | 1023 | 132 | 4 | (biogenesis [TI] OR ((cell wall [MAJR] OR cell membrane structures [MAJR] OR cytoplasmic structures [MAJR]) AND (organization [TI] OR arrangement [TI]))) AND (Genetics [MH]) AND 1984:1999 [DP] |
| Cell adhesion | GO:0007155 | 1025 | 133 | 5 | (cell adjesion [MAJR]) AND (genetics [MH]) AND 1993:1999 [DP] |
| Cell cycle | GO:0007049 | 1085 | 303 | 19 | (cell cycle [MAJR]) AND Genes [MH] AND 1996:1999 [DP] |
| Cell death | GO:0008219 | 1154 | 434 | 28 | (cell death [MAJR]) AND Genes [MH] AND 1997:1999 [DP] |
| Cell fusion | GO:0006947 | 740 | 20 | 0 | (cell fusion [MAJR] OR (mating [TI] AND Saccharomyces Cerevisiae [MAJR]) AND (Genetics [MH]) AND 1940:1999 [DP] |
| Cell motility | GO:0006928 | 1094 | 269 | 23 | (cell movement [MAJR]) AND (Genetics [MH]) AND 1995:1999 [DP] |
| Cell proliferation | GO:0008283 | 394 | 0 | 0 | (cell proliferation [TI]) AND (Genes [MH]) AND 1940:1999 [DP] |
| Cell–cell signaling | GO:0007267 | 237 | 41 | 0 | (synaptic transmission [MAJR] OR synapses [MAJR] OR gap junctions [MAJR]) AND (Genes [MH]) AND 1940:1999 [DP] |
| Chemimechanical coupling | GO:0006943 | 1011 | 147 | 6 | (contractile proteins [MAJR] OR kinesins [MAJR]) AND (Genes [MH]) AND 1993:1999 [DP]‖ |
| Intracellular protein traffic | GO:0006886 | 1107 | 322 | 28 | (endocytosis [MAJR] OR exocytosis [MAJR] OR transport vesicles [MAJR] OR protein transport [MAJR] OR nucleocytoplasmic [TI] AND (Genetics [MH]) AND 1994:1999 [DP] |
| Invasive growth | GO:0007125 | 492 | 52 | 4 | ((invasive [TI] AND growth [TI]) OR neoplasm invasiveness [MAJR]) AND (Genetics [MH]) AND 1940:1999 [DP] |
| Ion homeostasis | CO:0006873 | 424 | 64 | 5 | ((na [TI] OR k [TI] OR ion [TI] OR calcium [TI] OR sodium [TI] OR hydrogen [TI] OR potassium [TI] OR pH[TI] OR water [TI] AND (concentration [TI] OR senses [TI] OR sensing [TI] OR homeostasis [TI] OR homeostasis [MAJR]) AND (genetics [MH]) AND 1940:1999 [DP] |
| Meiosis | GO:0007126 | 1003 | 151 | 7 | ((meiosis {MAJR])) AND (Genes [MH] OR Proteins [MH]) AND 1986:1999 [DP] |
| Membrane fusion | GO:0006944 | 317 | 58 | 4 | (membrane fusion [MAJR]) AND (Genetics [MH]) AND 1940:1999 [DP] |
| Metabolism | GO:0008152 | 1005 | 225 | 30 | (metabolism [MAJR]) AND Genes [MH] AND 1989:1999 [DP] |
| Oncogenesis | GO:0007048 | 1043 | 168 | 15 | (cell transformation, neoplastic [MAJR] AND Genes [MH] AND 1994:1999 [DP] |
| Signal transduction | GO:0007165 | 1168 | 302 | 25 | (signal transduction [MAJR]) AND Genes [MH] AND 1995:1999 [DP] |
| Sporulation | GO:0007151 | 847 | 49 | 0 | (sporulation [TI] AND (genetics [MH]) AND 1940:1999 [DP] |
| Stress response | GO:0006950 | 1068 | 253 | 22 | (Wounds [MAJR] OR DNA repair [MAJR] OR DNA Damage [MAJR] OR Heat-Shock Response [MAJR] OR stress [MAJR] OR starvation [TI] OR soxR [TI] OR (oxidation-reduction [MAJR] NOT Electron-Transport [MAJR])) AND (Genes [MH]) AND 1996:1999 [DP] |
| Transport | GO:0006810 | 1022 | 84 | 8 | (biological transport [MAJR] OR transport [TI]) AND (Genes [MH]) AND 1985:1999 [DP] |
-
This table lists the category name in the first column, the corresponding gene ontology code in the second column, and the PubMed query used to obtain abstracts in the final column. For the training dataset, the articles were obtained by using the query as listed in the table. Within a PubMed query the [MAJR] label specifies MeSH major headings, [MH] specified MeSH headings, [TI] specifies title words, and [DP] species publication data ranges. The test2000 and test2001 datasets were obtained by modification of the publication date limit to restrict articles to those published in 2000 and 2001, respectively. Titles were omitted from the test data sets. The table lists the number of articles obtained for each category for the training and test sets.











