Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS
- Takeshi Nagashima1,2,
- Diego G. Silva3,4,
- Nikolai Petrovsky3,4,
- Luis A. Socha3,4,
- Harukazu Suzuki5,
- Rintaro Saito5,7,
- Takeya Kasukawa5,
- Igor V. Kurochkin1,
- Akihiko Konagaya2,6, and
- Christian Schönbach1,8
- 1Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- 2Department of Knowledge System Science, School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, 923-1292, Japan
- 3Medical Informatics Centre, University of Canberra, ACT 2601, Australia
- 4John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia
- 5Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- 6Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Abstract
FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies).
Footnotes
-
[Supplemental material is available online at www.genome.org and also at the FACTS Web site http://facts.gsc.riken.go.jp/supplement/.]
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1019903.
-
↵7 Present address: Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
-
↵8 Corresponding author. E-MAIL schoen{at}gsc.riken.go.jp; FAX 81 (0)45-503-9552.
-
- Accepted March 4, 2003.
- Received November 26, 2002.
- Cold Spring Harbor Laboratory Press











