Table 3.

Over- and Under-represented SCOP Superfamilies in OMIM Disease Genes

SCOP superfamily R ND NnD f Description
Interleukin 8-like Chemokines (V)6236123.00Mainly small inducible cytokines (single domain proteins), immunoregulatory and inflammatory processes, homeostasis, development. Secreted proteins, activity via GPCRs.
Nuclear receptor ligand-binding  domain (M)5640152.67Growth factor inducible intracellular steroid/thyroid receptors coupled with a DNA-binding domain (mostly glucocorticoid-receptor–like) such as estrogen receptor (breast cancer associated). Transcription factors and enhancers.
Cystine-knot cytokines (E)4942172.47Growth factors belonging to TGF-β, cell determination, differentiation and growth. Neurotrophins, differentiation, and function of neurones.
Periplasmic binding protein-like I962192.33Glutamate receptors, ionotropic (ion channels) and metabotropic (GPCRs with activity via a second messenger), also receptors for atrial natriuretic clearance peptides, involved in regulation of blood pressure.
Serpins (M)7626122.17Serine protease inhibitors of the blood-clotting cascade.
4-helical cytokines (V)6632152.13Different interferons and interleukins (extracellular single-domain proteins), regulatory in differentiation and proliferation, antiviral, immune, and inflammatory response.
Winged helix DNA-binding domain2170571.23Associated with at least 25 disease entries. Transcription factors (activation and repression). Dominated by forkhead family members, important in embryogenesis of the nervous system in mammals, associated with different leukemia; ETS family of oncogene products; histones (chromatin remodelling), and others.
Helix-loop-helix DNA-binding  domain (E)2854451.20Transcriptional control for cell-type determination during development, also transcriptional control of histone acetyltransferases (preparing chromatin for transcription).
Glucocorticoid receptor-like  (DNA-binding domain) (E)2562521.19Together with nuclear receptor ligand-binding domains (see above). Frequently found in proteins of developmental genes. LIM domain proteins deregulated in cancer cell-lines.
Homeodomain-like81311420.92Different homeobox proteins (transcription factors), particularly important in early embryogenesis. Some homeobox genes are oncogenes.
Protein kinase-like (PK-like)42462910.85About 100 different associated disease entries (e.g., different cancers). Range of kinases such as MAP or PKC (signal transduction).
RNA-binding domain6762550.30RNA splice factors (alternative splicing), rapid degradation of mRNAs in particular from cytokines and protooncogenes. Involved in spermatogenesis related to male infertility, for example.
RING-finger domain, C3HC4 (E)13431630.26Zinc-finger–like domain associated with protein-protein interaction, often found in transcription regulatory proteins. Linked to apoptosis inhibitors, breast cancer gene BRACA1, acute leukemia, for example.
Classic zinc finger, C2H221355490.25Nucleic-acid binding, range of transcription factors, cell proliferation and differentiation, early development, some are protooncogenes.
Tetratricopeptide repeat (TRP)19251210.21Interaction partner of regulatory proteins, subunit of G-proteins. Involved in a range of biological functions such as cell-cycle, activation of apoptosis, chromatin assembly, actin binding, cancer.
Ankyrin repeat12331870.18Protein-protein interaction domain. Found at least 17 different OMIM entries describing, e.g., inhibitor of NFkB and cyclin-dep. kinase inhibitors, interaction with p53 in apoptosis. Cooccurrence with other interaction and regulatory domains such as DEATH and SH3.
eL30-like585450.11Ribosomal protein L30, translation termination.
Pyk2-associated protein β ARF-GAP  domain (E)911310.03RIP protein that assists HIV in replication by facilitating the nuclear export of mRNA. Corresponds to the putative GTP-ase activating protein for Arf in PFAM. Nondisease proteins are often associated with PH-domains or ankyrin repeats and may have a range of biological function.

[i] For each SCOP superfamily, the rank order (R) of superfamily occurrences in sequences of the human proteome is given (see text for details), followed by the sequence frequency in disease genes (ND) and the frequency in nondisease genes (NnD). The ratio (f) of these occurrences is then given as ND/NnD. The double horizontal line separates over-represented from underrepresented superfamilies. Taking all SCOP domains together, the two populations (disease and nondisease) are significantly different (>99.9% confidence) as calculated by a χ2 test. For each SCOP superfamily, the frequency ratio compared to the others was significant at >95% confidence, after allowing for the number of SCOP domains tested (testing domains of each superfamily against all remaining domains). Bold letters in braces in the superfamily field indicate that this superfamily is specific for eukaryotes (E), metazoans (M), or vertebrates (V). The Description field gives an overview of the broad biological functions associated with the disease genes.