Domain and Repeat Families Identified
| Repeat type | Code | Fly protein | Pvalue | Description | Phyletic distribution |
| A1pp | e | CG10517 | 3.0 × 10−17 | Phosphatase family | Dm, Ce, Hs, Sc, Bac, Arch |
| CAP10 | n | CG17138 | 7.4 × 10−107 | Possible glycosyltransferase | Dm, Hs, Bac |
| CARP | n | Capt | 5.4 × 10−6 | Tandem repeats in CAPs and XRP2 | Dm, Ce, Hs, Sc |
| CENPB | e | CG13895 | 2.0 × 10−25 | Putative DNA-binding domain in, for example, mouse jerky | Dm, Ce, Hs, Sc |
| CLIP | e | CG15046 | 1.9 × 10−6 | In many arthropod serine proteases | Dm |
| CTNS | n | CG17119 | 3.1 × 10−5 | In cystinosin, a product of a gene mutated in infantile nephropathic cystinosis | Dm, Ce, Hs, Sc |
| DM3 | n | CG14860 | 5.6 × 10−6 | Derived from hAT/Tip100/Zaphod transposon family | Dm, Ce, Hs |
| DM4 | n | CG17780 | 7 × 10−19 | In fly proteins only (18) | Dm |
| DM5 | n | CG14241 | 3.8 × 10−25 | In fly proteins only (21) | Dm |
| DM6 | n | CG2149 | 1.8 × 10−12 | In fly proteins only (6) | Dm |
| DM8 | n | CG14458 | 3.7 × 10−71 | In fly proteins only (21) | Dm |
| DM9 | n | CG3884 | 2.7 × 10−83 | In fly proteins only (7) | Dm |
| DM10 | n | CG8959 | 6.4 × 10−19 | In nucleoside diphosphate kinase 7 | Dm, Ce, Hs |
| DM11 | n | CG15241 | 8.7 × 10−65 | In fly proteins only (6) | Dm |
| DM12 | n | CG14116 | 2.0 × 10−47 | In fly proteins only (17) | Dm |
| DM13 | n | CG14681 | 1.0 × 10−28 | In fly and worm hypothetical proteins | Dm, Ce |
| DM14 | n | CG4713 | 6.6 × 10−19 | In hypothetical proteins | Dm, Ce, Hs |
| DM15 | n | CG14066 | 4.1 × 10−6 | In La-related protein homologs | Dm, Ce, Hs |
| DM16 | n | CG1126 | 1.1 × 10−7 | In hypothetical proteins | Dm, Ce, Hs |
| DUSP | n | CG8494 | 2.8 × 10−8 | In ubiquitin-specific proteases (USPs) | DM, Ce Hs |
| DysF | n | CG6468 | n/a | Domain of unknown function in dysferlin-like proteins | Dm, Ce, Hs |
| E-Z | r | CG2245 | 1.8 × 10−18 | Sub-family of HEAT repeats (Neuwald and Hirano 2000) | Dm, Ce, Hs, Bac |
| GYR | n | CG13706 | 3.6 × 10−31 | In fly proteins only (10) | Dm |
| JHBP | e | CG7096 | 5.1 × 10−58 | Juvenile hormone-binding protein domains | Dm |
| LITAF | n | CG13515 | 3.9 × 10−30 | LPS-induced tumor necrosis factor α factor homologs | Dm, Ce, Hs |
| MADF | e | CG10949 | 5.1 × 10−33 | Myb/SANT-like domains in ADF-1, and other proteins | Dm, Ce |
| MORN | e | CG5458 | 7.7 × 10−15 | Repeats in PI4P-5-kinases and protein kinases | Dm, Ce, Hs |
| NEUZ | n | neuralized | 4.0 × 10−24 | Possible SPRY domain outliers; microtubule-binding? | Dm, Ce, Hs |
| NRF | e | CG10183 | 3.6 × 10−17 | Cysteine-rich domain in nrf-6 and ndg-4 | Dm, Ce |
| P4Hc | r | CG15542 | 5.3 × 10−23 | Expansion of the family of 2-oxoglutarate- and Fe(II)-dependent dioxygenases | Dm, Ce, Hs, Bac |
| PbH1 | e | CG9461 | 1.2 × 10−73 | Parallel β-helix repeats | Dm, Ce, Hs, Sc |
| PGRP | e | CG4432 | 3.4 × 10−21 | Phage T3-like lysozyme homologues | Dm, Hs, Bac |
| PhBP | e | CG15583 | 1.3 × 10−9 | Pheromone-binding protein domains | Dm |
| PURα | e | PURα | 6.7 × 10−20 | New bacterial homologous (e.g., Treponema pallidum TP0412) | Dm, Ce, Hs, Bac |
| RPEL | n | CG12188 | 2.3 × 10−8 | In hypothetical proteins | Dm, Ce, Hs |
| TDU | e | CG10741 | 1.5 × 10−6 | In human TONDU and fly vestigial | Dm, Ce, Hs |
| THEG | n | CG6332 | 3.8 × 10−6 | In mouse THEG; spermatogenesis factor | Dm, Hs |
| TIM | e | CG8148 | 9.1 × 10−6 | Possible Myb-like three helical domain | Dm, Ce, Hs |
| WWE | r | Deltex | n/a | Possible function in ubiquitin-mediated proteolysis | Dm, Ce, Hs |
| ZnF_CDGSH | n | CG3420 | 5.5 × 10−7 | Zinc finger of unknown function | Dm, Ce, Hs, Bac, Arch |
| Zpr1 | e | CG9060 | 1.7 × 10−20 | Repeated domain in eukaryotic Zpr1, but single copy in archaea | Dm, Ce, Hs, Arch |
-
Forty-one domain and repeat families were identified in this study. The codes indicate whether the family (n) was previously unrecognized or greatly expanded; (r) has recently been found independently; or (e) now contains significant additions to previously-known families. The phyletic distribution of the family is indicated by the following species or kingdom abbreviations: Dm,Drosophila melanogaster (representing the arthropods); Ce,Caenorhabditis elegans (representing the nematodes); Hs,Homo sapiens (representing the mammals); Sc,Saccharomyces cerevisiae (representing the fungi); Bac, bacteria; Arch, archaea.











