Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS

  1. Takeshi Nagashima1,2,
  2. Diego G. Silva3,4,
  3. Nikolai Petrovsky3,4,
  4. Luis A. Socha3,4,
  5. Harukazu Suzuki5,
  6. Rintaro Saito5,7,
  7. Takeya Kasukawa5,
  8. Igor V. Kurochkin1,
  9. Akihiko Konagaya2,6, and
  10. Christian Schönbach1,8
  1. 1Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
  2. 2Department of Knowledge System Science, School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, 923-1292, Japan
  3. 3Medical Informatics Centre, University of Canberra, ACT 2601, Australia
  4. 4John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia
  5. 5Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
  6. 6Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan

Abstract

FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies).

Footnotes

  • [Supplemental material is available online at www.genome.org and also at the FACTS Web site http://facts.gsc.riken.go.jp/supplement/.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1019903.

  • 7 Present address: Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan

  • 8 Corresponding author. E-MAIL schoen{at}gsc.riken.go.jp; FAX 81 (0)45-503-9552.

    • Accepted March 4, 2003.
    • Received November 26, 2002.
| Table of Contents

Preprint Server