Table 2.

Functions of Selected Lineage-Specific Protein Clusters in Five Eukaryotes

Name of the cluster[i] Species[ii] (no. of members) Biological functions and other comments
Transcription regulation
 AP2-like DNA-binding proteinsAt(117)Plant-specific transcription factors with multiple roles in stress and ethylene response and development (Riechmann et al. 2000).
 MYB-like DNA-binding proteinsAt(100, 48)HTH-domain-containing transcription factors with diverse roles in development and regulation of various environmental responses (Riechmann et al. 2000).
 WRKY-like DNA-binding proteinsAT(68)DNA-binding proteins involved in regulation of development and pathogen response.
 RF-A family of nucleic  acid-binding proteins (OB fold)At(47)An expansion involving the conserved archaeo-eukaryotic replication factor A that is present in a single copy in other eukaryotic lineages (Wold 1997).
 Viv1/PVAL-like transcription  factorsAT(41)Plant-specific transcription factors involved in abscisic acid response, seed differentiation, and development (Riechmann et al. 2000).
 Nuclear hormone receptorsCe (66, 43, 26, 26, and other small clusters)Zn-dependent DNA-binding proteins typified by vertebrate steroid receptors. Many of the C. elegans members of this family may function independently of ligands, and characterized members like odr-7 have roles in cell-type differentiation (Sluder et al. 1999).
 C4DM+Zn-finger-containing  proteinsDm(82)Transcription factors typified by the Zeste-white 5 family. Consist of a DNA-binding C2H2-finger and C4DM, a predicted Zn-dependent protein–protein interaction domain (Lander et al. 2001).
 SAZ-type Myb domain-containing proteinsDm(40)A specialized version of the MYB DNA-binding domain typified by transcription factors, such as Stonewall, Adf-1, and Zeste.
 POZ+Zn-fingerDm(55)A class of DNA-binding, chromatin-associated transcription factors, such as Broad-complex, Lola, and trithorax-like consist of a specific version of the POZ domain fused to a C2H2-finger.
 C6 finger-containing proteinsSp(4)Gal4-like C6 Zn fingers are among the most common transcription factors in the ascomycete fungi.
Pathogen/stress response
 AP-ATPasesAT(150, 29, 17)Plant disease-resistance loci products, typically consist of a TIR and an AP-ATPase domain combined with leucine-rich repeats (LRRs) (Hulbert et al. 2001).
 Pepsin-like proteasesAt(51), Ce(16)Secreted proteases that could be involved in extracellular regulatory proteolytic cascades.
 Subtilisin-like proteasesAt(57)Secreted proteases that could be involved in extracellular regulatory proteolytic cascades.
 Papain-like proteasesAt(14)Thiol proteases that could be involved in stress responses and in germination.
 Metalloproteases containing CUB  domainsCe(23)Membrane-associated metalloproteases that could be involved in proteolytic cascades on the cell surface.
 C-type lectinsCe(115, 42) Dm(28)Extracellular proteins containing adhesion modules potentially involved in recognition of specific pathogen surface molecules.
 ChitinasesCe(33) Dm(17)Enzymes potentially involved in hydrolysis of cell walls of fungal pathogens.
 Toll-like receptorsDm(8)Key receptors of the anti-pathogen response pathways.
 CUB-domain proteinsCe(40)Extracellular adhesion proteins.
 P450 hydroxylasesAt(124, 34, 33, 28) Dm(83) Ce(46, 16)Oxidoreductases involved in detoxification of diverse xenobiotics through hydroxylation (Nelson 1999; Tijet et al. 2001).
 PRI-domain proteinsAt(24) Ce(40)Secreted proteins that could function as inhibitors of enzymes or adhesion molecules.
 Cell wall mannoproteinsSc(11)Involved in cold shock and anoxic stress response.
 α-helical peroxidasesAt(73)Enzymes generating nascent oxygen as part of the oxidative defense mechanisms.
Signaling
 Concanavalin-like lectinsAt(43)Some of these lectins are fused to kinases as extracellular receptor domains and probably function as carbohydrate receptors.
 PPR-module proteinsAT(194, 195)α-superhelical proteins that could function as protein–protein interaction scaffolds in various contexts.
 Calcium-dependent protein  kinasesAT(44)The principal transducers of Ca+ + signaling that mediate this pathway in various contexts.
 Plant-specific protein kinasesAt(316)Involved in various signaling pathways, such as hormone response, disease resistance, and development. Often fused to various other domains, including Apple, LRRs, and bulb lectins.
 Octicosapeptide module proteinsAt(72, 17, 14)A Ca+ +-binding signaling module; some are fused to VTV1-like DNA-binding domains and GAF domains (Ponting 1996).
 NPH-3-like, plant-specific  POZ-domain proteinsAt(30)Specialized POZ domains, some of which are involved in plant light response signaling.
 PP2C phosphatasesAt(20)Phosphoserine phosphatases that function in diverse signlaing pathways, e.g., abscisic acid signaling.
 Worm-specific S/T kinasesCe(65)A distinct, nematode-specific branch of the casein kinase family.
 Receptor guanylate cyclases  fused to protein kinasesCe(13, 12)Potential receptors of secreted peptide first messengers by analogy to mating pheromone receptors of sea urchins.
 Worm-specific domainsCe(42)Uncharacterized domain probably involved in specific protein–protein interactions; some are fused to SET, caspase, kinase, and PHD domains.
 POZ-domain proteinsCe(26, 29)Often fused to MATH domains, possibly function as chromatin-associated adaptors.
 Insulin-like peptidesCe(11)Probably function as nematode-specific peptide hormones or growth factors.
 Sec14-domain proteinsDm(23)Probably participate in regulation of protein trafficking and vesicular cargo loading.
 SET-domain proteins with an  inserted metal-chelating  moduleDm(10)Protein methyltransferases containing a divergent SET domain with a characteristic insert of a metal-chelating module. Probable regulators of chromatin dynamics.
 Geko-domain proteinsDm(8, 17)A large family of Drosophila-specific cysteine-rich proteins, the only characterized member, Geko, is involved in olfaction. The LSC might be functionally coupled to the correspondingly expanded olfactory receptor families.
Ubiquitin signaling/protein unfolding and degradation
 F-box proteinsAt(251, 64, 41, 23) Ce(111, 46, 21)Specificity-defining E3 subunits of ubiquitin ligases; fused to several other domains that might act as scaffolds for the assembly of the ubiquitinating enzyme complexes (Kipreos and Pagano 2000).
 RING-finger proteinsAt(74, 16, 12)The majority of the RING fingers in the LSCs are of the RING-H2 category; probably function as specific E3-ligases
 U-box proteinsAt(21, 18)RING-finger derivatives that probably mediate multiubiquitination of specific targets.
 Ubiquitin-domain proteinsAt(11)Probably utilized similarly to ubiquitin, but could specifically conjugate with different proteins.
 Adenoviral-type proteasesAt(117)Probably involved in deubiquitination as exemplified by ULP1/SMT4 (Li and Hochstrasser 2000; Nishida et al. 2000).
 GH3-domain proteinsAt(17)Share a conserved domain with the E1 subunits of ubiquitin ligases; might be negative regulators of the signalosome.
 MATH-domain proteinsCe(81) At(73)Related to the MATH domains of the ubiquitin carboxy-terminal hydrolases and E3-ligases of the TRAF family; could function as adaptors in ubiquitin pathways.
 Prolyl hydroxylasesDm(19) At(10)Hydroxylation of prolines by these enzymes might provide targets for ubiquitination by specific E3-ligases (Aravind and Koonin 2001).
 Cyclophilin-type peptidyl-prolyl  isomerasesDm(10)Catalyze isomerization of proline-containing peptide bonds; might function in regulating aggregation of protein complexes.
Chemoreceptors and small molecule sensors
 7-transmembrane olfactory  receptorsCe(264, 228, 122)Receptors for odorants/environmental chemicals (Dryer 2000; Glusman et al. 2001).
 Insect-type odorant receptorsDm(55)Receptors for odorants/environmental chemicals.
 Pheromone-binding proteinsDm(27)Probably involved in the binding and delivery of odorants to chemoreceptory cells.
 Patched-type sterol binding  membrane proteinsCe(15)Bind lipids and sterols in various contexts including stabilization of receptor complexes.
 Juvenile hormone and other  small-molecule-binding  proteinsDm(27)Probably involved in the binding and delivery of small molecules in the insect haemolymph.
 Lipid-bind proteins (NLTP)At(49, 26)Cysteine-rich α-helical proteins involved in lipid binding and delivery in various contexts and wax deposition.
 Jacalin-type lectinsAt(44)Might be involved in sugar binding and storage.
 HemocyaninsDm(10)Copper-dependent oxygen transport proteins.
 Cyanin family proteinsAt(34)Copper-binding proteins.
Ion Channels and Transporters
 Degenerin family channelsDm(24)Sodium channels, probably function in tactile reception and related ion-dependent signaling pathways.
 Potassiumm channelsCe(15)Potassium channels of the double pore category, probably function as pH-dependent channels.
 Innexin-type channelsCe(20)Channels related to the Dm Shaking-B protein, might be involved in the formation of gap junctions.
 cNMP-gated channelsAt(21)Cyclic nucleotide-gated channels containing an intracellular cNMP-binding domain.
 Amino acid transportersAt(33)Amino acid transporters of the N-amino acid transporter family.
 Potassium transportersAt(17)Belong to the plant tiny root hairfamily; probably involved in potassium uptake.
 Na-P-transporter-related proteinsCe(26)Probably involved in phosphate uptake by symport.
 Hexose transportersSc(15)Belong to the 12 TM sugar transporter superfamily.
 ABC transportersDm(11, 9, 5)Transporters containing two ABC-class ATPase domains.
Small molecule metabolism
 LipasesAt(106)A family of phospholipid lipases of the flavodoxin fold; involved in degradation of phosphatidylcholine. Could be involved in metabolizing lipids in germination or degrading lipid membranes of pathogens.
 2-OG-Fe dioxygenasesAt(67)Hydroxylases involved in the biosynthesis of numerous plant secondary metabolites, such as gibberellins (Aravind and Koonin 2001).
 NH2 cinnamoyl/benzoyltransferaseAt(56)Transfers aromatic carboxylic acid groups to diverse targets in the biosynthesis of plant secondary metabolites.
 Small molecule O-methylasesAt(38, 15)Catalyze the methylation step in the biosynthesis of diverse plant products, such as caffeic acid.
 Glutathione S-transferasesAt(14) Ce(28) Dm(27)Catalyze the conjugation of electrophilic substrates, particular xenobiotic, to glutathione as part of their transport and detoxification; additionally have peroxidase and small molecule isomerase activities.
 Predicted secreted small  molecule methylasesCe(32)Contain specific disulfide bonds; probably catalyze methylation of extracellular small molecules.
 Integral membrane   O-acyltransferasesCe(151)A family of membrane-associated acyltransferases closely related to the bacterial membrane associated acyltransferases that acylate macrolide antibiotics and cell surface polysaccharides.
 Predicted small molecule kinasesCe(23) Dm(45)Related to aminoglycoside and lipid kinases; probably involved in phosphorylation of small molecules, such as odorants and/or xenobiotics.
Structural/morphological proteins
 Cystine-rich expansionsAt(35)Plant cell-wall glycoproteins.
 Pectin methylesterasesAt(89)Involved in the biosynthesis of pectins, major structural components of plants.
 Pectin-associated proteinsAt(26)Four-cysteine α-helical domains, some fused to pectin esterases.
 Cuticular collagensCe(34, 32, 26, 11)The principal structural component of the nematode cuticle (Johnstone 2000).
 Major sperm protein familyCe(32, 10)The principal structural component of nematode sperms.
 Insect cuticular proteinsDm(88)The principal structural component of the insect cuticle (Andersen et al. 1995).
 Peritrophin-like proteinsDm(40)Insect-specific extracellular matrix proteins.
 Cell wall glycoproteinsSc(11)Protein component of the yeast cell wall.
 Ecm34p-like proteinsSc(25)Protein component of the yeast cell wall.

[i] The members of each LSC are listed in the Supplementary Material section, in which the LSCs can be identified by their names and the number of members.

[ii] Species abbreviations: (At) Arabidopsis thaliana;(Ce) Caenorhabditis elegans; (Dm) Drosophila melanogaster; (Sc) Saccharomyces cerevisiae; (Sp)Schizosaccharomyces pombe. The number of members in each LSC is indicated in parentheses; commas separate distinct LSCs that belong to the same class of paralogous proteins.