The majority of common diseases such as cancer, allergy, diabetes, or heart disease are characterized by complex genetic traits, in which genetic and environmental components contribute to disease susceptibility. Our knowledge of the genetic factors underlying most of such diseases is limited. A major goal in the post-genomic era is to identify and characterize disease susceptibility genes and to use this knowledge for disease treatment and prevention. More than 500 genes are conserved across the invertebrate and vertebrate genomes. Because of gene conservation, various organisms including yeast, fruitfly, zebrafish, rat, and mouse have been used as genetic models for the study of human disease. The basic housekeeping genes such as those involved in metabolism, intracellular signalling, transcription/translation, DNA replication, and repair are highly conserved in eukaryotes, and yeast and fruitfly are useful, therefore, for the study of basic cellular processes and related diseases. However, these organisms do not share with humans large groups of genes, such as those involved in homeostasis, immunity, and cellular interactions. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological, and pathological pathways. The comparison of the human genome with the FANTOM1 mouse cDNA clone set showed that ∼80% of mouse cDNA clones have matches in the human genome. In this work, we define the term patholog to mean a mouse gene with sequence similarity to a known human disease-related gene. Previous genome-wide studies of pathologs have largely focused on diseases exhibiting Mendelian inheritance patterns. In this work, we have expanded the analysis to all potential pathologs regardless of their pattern of inheritance. A bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2622 sequences that showed similarity (70%–85% identity) to known human-disease genes or proteins. Using automated computational tools in parallel with human expert analysis of 33207 MEDLINE scientific abstracts, we identified 184 novel mRNA transcripts (targets) with sequence similarity to genes encoding proteins reported as disease-related in humans (reference proteins). Of these targets, 36 were identified by computational tools only, 49 by a human expert analysis only, and 99 by both methods. The reference proteins related to cancer (53%), hereditary (23%), immunological (5%), cardio-vascular (4%), or other (15%), disorders. The role of these candidate pathologs in disease pathogenesis will require further characterization. It is likely that at least some of these potential pathologs will not be confirmed experimentally because, for example, they represent nonfunctional transcripts or gene products with sequence similarity, but different function. Those pathologs that are experimentally validated as functionally relevant will be used as targets for genetic manipulation and development of mouse models of human disease. The similarity between mouse and human genomes and their closely related biochemical, physiological, and pathological pathways makes the mouse an invaluable model organism for the study of human disease.
Notes
[1] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1461303.
[2] Takahiro Arakawa,6 Piero Carninci,6,7 Jun Kawai,6,7 and Yoshihide Hayashizaki6,7
[3] Corresponding author. E-MAIL: [email protected]; FAX +61-2-62603372.