Table 1.

Domain and Repeat Families Identified

Repeat type Code Fly protein Pvalue Description Phyletic distribution
A1ppeCG105173.0 × 10−17 Phosphatase familyDm, Ce, Hs, Sc, Bac, Arch
CAP10nCG171387.4 × 10−107 Possible glycosyltransferaseDm, Hs, Bac
CARPnCapt5.4 × 10−6 Tandem repeats in CAPs and XRP2Dm, Ce, Hs, Sc
CENPBeCG138952.0 × 10−25 Putative DNA-binding domain in, for example, mouse jerkyDm, Ce, Hs, Sc
CLIPeCG150461.9 × 10−6 In many arthropod serine proteasesDm
CTNSnCG171193.1 × 10−5 In cystinosin, a product of a gene mutated  in infantile nephropathic cystinosisDm, Ce, Hs, Sc
DM3nCG148605.6 × 10−6 Derived from hAT/Tip100/Zaphod transposon familyDm, Ce, Hs
DM4nCG17780  7 × 10−19 In fly proteins only (18)Dm
DM5nCG142413.8 × 10−25 In fly proteins only (21)Dm
DM6nCG21491.8 × 10−12 In fly proteins only (6)Dm
DM8nCG144583.7 × 10−71 In fly proteins only (21)Dm
DM9nCG38842.7 × 10−83 In fly proteins only (7)Dm
DM10nCG89596.4 × 10−19 In nucleoside diphosphate kinase 7Dm, Ce, Hs
DM11nCG152418.7 × 10−65 In fly proteins only (6)Dm
DM12nCG141162.0 × 10−47 In fly proteins only (17)Dm
DM13nCG146811.0 × 10−28 In fly and worm hypothetical proteinsDm, Ce
DM14nCG47136.6 × 10−19 In hypothetical proteinsDm, Ce, Hs
DM15nCG140664.1 × 10−6 In La-related protein homologsDm, Ce, Hs
DM16nCG11261.1 × 10 −7 In hypothetical proteinsDm, Ce, Hs
DUSPnCG84942.8 × 10−8 In ubiquitin-specific proteases (USPs)DM, Ce Hs
DysFnCG6468n/aDomain of unknown function in dysferlin-like proteinsDm, Ce, Hs
E-ZrCG22451.8 × 10−18 Sub-family of HEAT repeats (Neuwald and Hirano 2000)Dm, Ce, Hs, Bac
GYRnCG137063.6 × 10−31 In fly proteins only (10)Dm
JHBPeCG70965.1 × 10−58 Juvenile hormone-binding protein domainsDm
LITAFnCG135153.9 × 10−30 LPS-induced tumor necrosis factor α factor homologsDm, Ce, Hs
MADFeCG109495.1 × 10−33 Myb/SANT-like domains in ADF-1, and other proteinsDm, Ce
MORNeCG54587.7 × 10−15 Repeats in PI4P-5-kinases and protein kinasesDm, Ce, Hs
NEUZnneuralized4.0 × 10−24 Possible SPRY domain outliers; microtubule-binding?Dm, Ce, Hs
NRFeCG101833.6 × 10−17 Cysteine-rich domain in nrf-6 and ndg-4Dm, Ce
P4HcrCG155425.3 × 10−23 Expansion of the family of 2-oxoglutarate- and  Fe(II)-dependent dioxygenasesDm, Ce, Hs, Bac
PbH1eCG94611.2 × 10−73 Parallel β-helix repeatsDm, Ce, Hs, Sc
PGRPeCG44323.4 × 10−21 Phage T3-like lysozyme homologuesDm, Hs, Bac
PhBPeCG155831.3 × 10−9 Pheromone-binding protein domainsDm
PURαePURα6.7 × 10−20 New bacterial homologous (e.g., Treponema pallidum  TP0412)Dm, Ce, Hs, Bac
RPELnCG121882.3 × 10−8 In hypothetical proteinsDm, Ce, Hs
TDUeCG107411.5 × 10−6 In human TONDU and fly vestigialDm, Ce, Hs
THEGnCG63323.8 × 10−6 In mouse THEG; spermatogenesis factorDm, Hs
TIMeCG81489.1 × 10−6 Possible Myb-like three helical domainDm, Ce, Hs
WWErDeltexn/aPossible function in ubiquitin-mediated proteolysisDm, Ce, Hs
ZnF_CDGSHnCG34205.5 × 10−7 Zinc finger of unknown functionDm, Ce, Hs, Bac, Arch
Zpr1eCG90601.7 × 10−20 Repeated domain in eukaryotic Zpr1,  but single copy in archaeaDm, Ce, Hs, Arch

[i] Forty-one domain and repeat families were identified in this study. The codes indicate whether the family (n) was previously unrecognized or greatly expanded; (r) has recently been found independently; or (e) now contains significant additions to previously-known families. The phyletic distribution of the family is indicated by the following species or kingdom abbreviations: Dm,Drosophila melanogaster (representing the arthropods); Ce,Caenorhabditis elegans (representing the nematodes); Hs,Homo sapiens (representing the mammals); Sc,Saccharomyces cerevisiae (representing the fungi); Bac, bacteria; Arch, archaea.