Novel Protein Domains and Repeats in Drosophila melanogaster: Insights into Structure, Function, and Evolution

Table 1.

Domain and Repeat Families Identified

Repeat type Code Fly protein Pvalue Description Phyletic distribution
A1pp e CG10517 3.0 × 10−17 Phosphatase family Dm, Ce, Hs, Sc, Bac, Arch
CAP10 n CG17138 7.4 × 10−107 Possible glycosyltransferase Dm, Hs, Bac
CARP n Capt 5.4 × 10−6 Tandem repeats in CAPs and XRP2 Dm, Ce, Hs, Sc
CENPB e CG13895 2.0 × 10−25 Putative DNA-binding domain in, for example, mouse jerky Dm, Ce, Hs, Sc
CLIP e CG15046 1.9 × 10−6 In many arthropod serine proteases Dm
CTNS n CG17119 3.1 × 10−5 In cystinosin, a product of a gene mutated  in infantile nephropathic cystinosis Dm, Ce, Hs, Sc
DM3 n CG14860 5.6 × 10−6 Derived from hAT/Tip100/Zaphod transposon family Dm, Ce, Hs
DM4 n CG17780   7 × 10−19 In fly proteins only (18) Dm
DM5 n CG14241 3.8 × 10−25 In fly proteins only (21) Dm
DM6 n CG2149 1.8 × 10−12 In fly proteins only (6) Dm
DM8 n CG14458 3.7 × 10−71 In fly proteins only (21) Dm
DM9 n CG3884 2.7 × 10−83 In fly proteins only (7) Dm
DM10 n CG8959 6.4 × 10−19 In nucleoside diphosphate kinase 7 Dm, Ce, Hs
DM11 n CG15241 8.7 × 10−65 In fly proteins only (6) Dm
DM12 n CG14116 2.0 × 10−47 In fly proteins only (17) Dm
DM13 n CG14681 1.0 × 10−28 In fly and worm hypothetical proteins Dm, Ce
DM14 n CG4713 6.6 × 10−19 In hypothetical proteins Dm, Ce, Hs
DM15 n CG14066 4.1 × 10−6 In La-related protein homologs Dm, Ce, Hs
DM16 n CG1126 1.1 × 10−7 In hypothetical proteins Dm, Ce, Hs
DUSP n CG8494 2.8 × 10−8 In ubiquitin-specific proteases (USPs) DM, Ce Hs
DysF n CG6468 n/a Domain of unknown function in dysferlin-like proteins Dm, Ce, Hs
E-Z r CG2245 1.8 × 10−18 Sub-family of HEAT repeats (Neuwald and Hirano 2000) Dm, Ce, Hs, Bac
GYR n CG13706 3.6 × 10−31 In fly proteins only (10) Dm
JHBP e CG7096 5.1 × 10−58 Juvenile hormone-binding protein domains Dm
LITAF n CG13515 3.9 × 10−30 LPS-induced tumor necrosis factor α factor homologs Dm, Ce, Hs
MADF e CG10949 5.1 × 10−33 Myb/SANT-like domains in ADF-1, and other proteins Dm, Ce
MORN e CG5458 7.7 × 10−15 Repeats in PI4P-5-kinases and protein kinases Dm, Ce, Hs
NEUZ n neuralized 4.0 × 10−24 Possible SPRY domain outliers; microtubule-binding? Dm, Ce, Hs
NRF e CG10183 3.6 × 10−17 Cysteine-rich domain in nrf-6 and ndg-4 Dm, Ce
P4Hc r CG15542 5.3 × 10−23 Expansion of the family of 2-oxoglutarate- and  Fe(II)-dependent dioxygenases Dm, Ce, Hs, Bac
PbH1 e CG9461 1.2 × 10−73 Parallel β-helix repeats Dm, Ce, Hs, Sc
PGRP e CG4432 3.4 × 10−21 Phage T3-like lysozyme homologues Dm, Hs, Bac
PhBP e CG15583 1.3 × 10−9 Pheromone-binding protein domains Dm
PURα e PURα 6.7 × 10−20 New bacterial homologous (e.g., Treponema pallidum  TP0412) Dm, Ce, Hs, Bac
RPEL n CG12188 2.3 × 10−8 In hypothetical proteins Dm, Ce, Hs
TDU e CG10741 1.5 × 10−6 In human TONDU and fly vestigial Dm, Ce, Hs
THEG n CG6332 3.8 × 10−6 In mouse THEG; spermatogenesis factor Dm, Hs
TIM e CG8148 9.1 × 10−6 Possible Myb-like three helical domain Dm, Ce, Hs
WWE r Deltex n/a Possible function in ubiquitin-mediated proteolysis Dm, Ce, Hs
ZnF_CDGSH n CG3420 5.5 × 10−7 Zinc finger of unknown function Dm, Ce, Hs, Bac, Arch
Zpr1 e CG9060 1.7 × 10−20 Repeated domain in eukaryotic Zpr1,  but single copy in archaea Dm, Ce, Hs, Arch
  • Forty-one domain and repeat families were identified in this study. The codes indicate whether the family (n) was previously unrecognized or greatly expanded; (r) has recently been found independently; or (e) now contains significant additions to previously-known families. The phyletic distribution of the family is indicated by the following species or kingdom abbreviations: Dm,Drosophila melanogaster (representing the arthropods); Ce,Caenorhabditis elegans (representing the nematodes); Hs,Homo sapiens (representing the mammals); Sc,Saccharomyces cerevisiae (representing the fungi); Bac, bacteria; Arch, archaea.

This Article

  1. Genome Res. 11: 1996-2008

Preprint Server