Systematic Identification of Novel Protein Domain Families Associated with Nuclear Functions

Table 1.

Table of Novel Domains

Domain Description Length (AS) Sec.struct.pred. Pred.function No. of proteins Associated domains Species Acc. no. of a representative sequence (domain borders)
Part A.—Domains Present in Different Species
JmjC Jumonji related family 100 β Metallo-enzymes 140 BRIGHT, jmjN PHD, FBOX, LRR, C2, TPR PLAc, CXXC, ZnF_C2H2 Eu, y, a, c, d, h O14607 (1042–1205)
CSZ Domain in chromatin remodeling S1 domain containing and Zinc finger proteins 750 α/β DNA-binding, Chromatin modulation 35 S1, SH2, C2HC, HhH Eu y, a, c, d, h P34703  (389–1120)
RPR Proteins involved in regulation of nuclear pre-mRNA 120 α Protein-interaction 40 RRM, PWWP, SURP, G-Patch y, a, c, d, h Q9SJQ7  (88–225)
DDT Different transcription and chromosome remodeling factors 60 α DNA-binding 30 AT_Hook, PHD, HOX, BROMO, MBD y, a, c, d, h Q9UIG2 (102–161)
TLDc TBC, LysM and other proteins 220 α/β+β Enzyme 30 TBC, LysM, R3H, FBOX y, a, c, d, h Q9VNA1 (1163–1325)
PUG Protein knases, UBA or UBX domain containing proteins and glycanases 60 α/β RNA-binding 25 C2H2, UBA, TGc, UBX, S_TKc, STYKc y, a, c, d, h Q9MAT3 (323–386)
HSA Helicases and SANT domains 70 α DNA-binding 20 SANT, BROMO DEXDc, HELlc y, a, c, d, h P25439 (501–573)
PSP Proline-rich, in spliceosome associated proteins 60 α RNA- or snRNP-binding 15 SAP, C2HC y, a, c, d, h O16997 (200–357)
FYRN Trithorax and X-chromosome inactivating proteins 40 α/β Unknown 25 PHD, SET, PWWP a, c, d, h Q24742 (1869–1914)
FYRC Trithorax and X-chromosome inactivating proteins 90 α/β Unknown 25 PHD, SET, PWWP a, c, d, h Q24742 (3495–3583)
RUN TBC, PH, FYVE and other proteins 65 α GTPase signalling 40 DENN, TBC, PLAT, PH, C1, FYVE, GST, SH3 c, d, h BAB14033 (115–178)
TCH Transcription factors and CHROMO domain helicases 50 α/β Unknown 20 CHROMO, PHD, TFSM2, DEXDc, HELIc, SANT, BROMO c, d, h O15025 (882–931)
DZF DSRM or ZnF_C2H2 domain containing proteins 250 α/β Unknown 40 C2H2, DSRM c, d, h O88531  (762–1016)
NEUZ Domain in neuralized-like proteins 120 β Unknown 10 SOCS, RING, SPRY, SH2 c, d, h Q19299 (199–321)
ZnF_TTF Domain in transposases and transcription factors 100 α + β Metal-binding 20 KRAB, BTB a, d, h Q9ZWT4 (100–199)
Part B.—Domains Species–Specific
FBD Domain in FBOX and other domain containing plant proteins 80 α/β Unknown 160 FBOX, LRRcap, BRCT, AAA a Q9LXJ7 (304–382)
ZnF_PMZ Plant mutator transposase zinc finger domain 27 α/β Metal-binding 125 AT_Hook, ZnF_C2HC, PHD a Q9SH73 (3212–3239)
SPK SET and PHD domain containing proteins and protein kinases 120 α/β Protein-interaction 40 SET, ICE_p10, ICE_p20, ZnF_C2HC, PHD, STYKc c Q9XU06 (139–250)
Part C.—Domains, Newly Recognized Divergent Subfamilies
ZnF_BED BED zinc finger, Related to C2H2/C2H2 zinc fingers (based on pattern similarity) 60 β Metal binding 50 AT_Hook, PTPc_DSPc y, a, c, d, h Q9LWM2 (169–224)
CPDc Catalytic domain of ctd-like phosphatases, related to phosphatase superfamily (based on pattern similarity) 120 α/β Phosphatase 70 BRCT, DSRM, UBQ y, a, c, d, h Q9PTJ8  (93–236)
RWD RING finger and WD repeat containing proteins and DEXDc helicases, related to the UBCc domain (revealed by hmm searches) 110 α/β Protein-interaction 60 S_TKc, RING, WD, UPF29, DEXDc, HELIc y, a, c, d, h Q9QZ05  (25–137)
BTP Bromodomain transcription factors and PHD domain containing Proteins, related to archaeal histone-like transcription factors, defined by PFAM (revealed by PSI-Blast results with less significance (E = 0.041)) 90 α DNA-binding 25 AT_Hook, BROMO, PHD y, a, c, d, h Q9S7R9  (41–131)
MADF Zinc finger, PHD domain and WD repeats containing proteins, related to SANT domain (after the second iteration Q9SR68 bridges to SANT domains (E = 0.002)) 90 α DNA- or Potein-binding 60 C2H2, PHD, WD Virus, a, c, d Q9V5Y9  (22–110)
Znf_DBF Zinc finger in DBF-like proteins, related to C2H2 zinc fingers (revealed by pattern similarity and hmm searches, E value = 1.4) 50 α Metal-binding 10 BRCT, AT_Hook y, d, h O93843 (590–638)
CHK C4-zinc finger and HLH domain containing kinase subfamily of choline kinases (after the second iteration P35790 bridges to choline kinases, defined by PFAM (E = 0.003)) 200 α/β Enzyme 70 ZnF_C4, HLH, i.c Eu, c, d Q9VBT6 (129–321)
Part D.—Family Specific Extensions of Known Domains
AWS Associated with SET domain, subdomain of PRESET (hmm searches, E value = 0.52) 50 α/β Histone modification 25 SET, PWWP, AT_Hook, WW, PHD, POSTSET, BAH y, a, c, d, h P46995  (63–119)
POX Domain associated with HOX-domains 50 α Unknown 20 HOX a Q38897 (199–337)
PRE_C2HC Associated with zinc fingers 70 α/β Unknown 15 ZnF_C2HC d O44939 (546–616)
  • First column, domain name; second column, domain description (e.g., associated domains or well-described proteins); third column, approximate domain length (number of amino acids); fourth column, secondary structure prediction (Rost et al. 1994) (α: domain consists of α-helices; β: domain consists of β-strands; α/β: domain consists of α-helices and β-strands); fifth column, predicted function of novel domain; sixth column, number of proteins containing the novel domain; seventh column, names of associated domains (domain names are according to the Simple Modular Architecture Research Tool (http://smart.embl-heidelberg.de) (Schultz et al. 1998, 2000) or the domain is defined by Pfam (Bateman et al. 2000)†; eighth column, species representives containing the novel domain. Abbreviations: eu, eubacteria; virus, viruses; y, yeast; a, Arabidopsis thaliana; c, Caenorhabditis elegans; d,Drosophila melanogaster; h, Homo sapiens. The ninth column, gives the accession number of representative protein and region of the detected domain in amino acids.

  • Novel domain is accepted, in press, or published recently.

  • Unpublished domain.

  • Additional HMM searches are needed to define all novel domain-containing proteins.

  • +The more conserved parts of the domains FYRN andFYRC were called ATA1 and ATA2 in human ALR protein (Prasad et al. 1997) and FYR (merged in one domain) in plant proteins (Balciunas and Ronne 2000), respectively.

This Article

  1. Genome Res. 12: 47-56

Preprint Server