The Role of Lineage-Specific Gene Family Expansion in the Evolution of Eukaryotes

Table 2.

Functions of Selected Lineage-Specific Protein Clusters in Five Eukaryotes

Name of the cluster Species (no. of members) Biological functions and other comments
Transcription regulation
 AP2-like DNA-binding proteins At(117) Plant-specific transcription factors with multiple roles in stress and ethylene response and development (Riechmann et al. 2000).
 MYB-like DNA-binding proteins At(100, 48) HTH-domain-containing transcription factors with diverse roles in development and regulation of various environmental responses (Riechmann et al. 2000).
 WRKY-like DNA-binding proteins AT(68) DNA-binding proteins involved in regulation of development and pathogen response.
 RF-A family of nucleic  acid-binding proteins (OB fold) At(47) An expansion involving the conserved archaeo-eukaryotic replication factor A that is present in a single copy in other eukaryotic lineages (Wold 1997).
 Viv1/PVAL-like transcription  factors AT(41) Plant-specific transcription factors involved in abscisic acid response, seed differentiation, and development (Riechmann et al. 2000).
 Nuclear hormone receptors Ce (66, 43, 26, 26, and other small clusters) Zn-dependent DNA-binding proteins typified by vertebrate steroid receptors. Many of the C. elegans members of this family may function independently of ligands, and characterized members like odr-7 have roles in cell-type differentiation (Sluder et al. 1999).
 C4DM+Zn-finger-containing  proteins Dm(82) Transcription factors typified by the Zeste-white 5 family. Consist of a DNA-binding C2H2-finger and C4DM, a predicted Zn-dependent protein–protein interaction domain (Lander et al. 2001).
 SAZ-type Myb domain-containing proteins Dm(40) A specialized version of the MYB DNA-binding domain typified by transcription factors, such as Stonewall, Adf-1, and Zeste.
 POZ+Zn-finger Dm(55) A class of DNA-binding, chromatin-associated transcription factors, such as Broad-complex, Lola, and trithorax-like consist of a specific version of the POZ domain fused to a C2H2-finger.
 C6 finger-containing proteins Sp(4) Gal4-like C6 Zn fingers are among the most common transcription factors in the ascomycete fungi.
Pathogen/stress response
 AP-ATPases AT(150, 29, 17) Plant disease-resistance loci products, typically consist of a TIR and an AP-ATPase domain combined with leucine-rich repeats (LRRs) (Hulbert et al. 2001).
 Pepsin-like proteases At(51), Ce(16) Secreted proteases that could be involved in extracellular regulatory proteolytic cascades.
 Subtilisin-like proteases At(57) Secreted proteases that could be involved in extracellular regulatory proteolytic cascades.
 Papain-like proteases At(14) Thiol proteases that could be involved in stress responses and in germination.
 Metalloproteases containing CUB  domains Ce(23) Membrane-associated metalloproteases that could be involved in proteolytic cascades on the cell surface.
 C-type lectins Ce(115, 42) Dm(28) Extracellular proteins containing adhesion modules potentially involved in recognition of specific pathogen surface molecules.
 Chitinases Ce(33) Dm(17) Enzymes potentially involved in hydrolysis of cell walls of fungal pathogens.
 Toll-like receptors Dm(8) Key receptors of the anti-pathogen response pathways.
 CUB-domain proteins Ce(40) Extracellular adhesion proteins.
 P450 hydroxylases At(124, 34, 33, 28) Dm(83) Ce(46, 16) Oxidoreductases involved in detoxification of diverse xenobiotics through hydroxylation (Nelson 1999; Tijet et al. 2001).
 PRI-domain proteins At(24) Ce(40) Secreted proteins that could function as inhibitors of enzymes or adhesion molecules.
 Cell wall mannoproteins Sc(11) Involved in cold shock and anoxic stress response.
 α-helical peroxidases At(73) Enzymes generating nascent oxygen as part of the oxidative defense mechanisms.
Signaling
 Concanavalin-like lectins At(43) Some of these lectins are fused to kinases as extracellular receptor domains and probably function as carbohydrate receptors.
 PPR-module proteins AT(194, 195) α-superhelical proteins that could function as protein–protein interaction scaffolds in various contexts.
 Calcium-dependent protein  kinases AT(44) The principal transducers of Ca+ + signaling that mediate this pathway in various contexts.
 Plant-specific protein kinases At(316) Involved in various signaling pathways, such as hormone response, disease resistance, and development. Often fused to various other domains, including Apple, LRRs, and bulb lectins.
 Octicosapeptide module proteins At(72, 17, 14) A Ca+ +-binding signaling module; some are fused to VTV1-like DNA-binding domains and GAF domains (Ponting 1996).
 NPH-3-like, plant-specific  POZ-domain proteins At(30) Specialized POZ domains, some of which are involved in plant light response signaling.
 PP2C phosphatases At(20) Phosphoserine phosphatases that function in diverse signlaing pathways, e.g., abscisic acid signaling.
 Worm-specific S/T kinases Ce(65) A distinct, nematode-specific branch of the casein kinase family.
 Receptor guanylate cyclases  fused to protein kinases Ce(13, 12) Potential receptors of secreted peptide first messengers by analogy to mating pheromone receptors of sea urchins.
 Worm-specific domains Ce(42) Uncharacterized domain probably involved in specific protein–protein interactions; some are fused to SET, caspase, kinase, and PHD domains.
 POZ-domain proteins Ce(26, 29) Often fused to MATH domains, possibly function as chromatin-associated adaptors.
 Insulin-like peptides Ce(11) Probably function as nematode-specific peptide hormones or growth factors.
 Sec14-domain proteins Dm(23) Probably participate in regulation of protein trafficking and vesicular cargo loading.
 SET-domain proteins with an  inserted metal-chelating  module Dm(10) Protein methyltransferases containing a divergent SET domain with a characteristic insert of a metal-chelating module. Probable regulators of chromatin dynamics.
 Geko-domain proteins Dm(8, 17) A large family of Drosophila-specific cysteine-rich proteins, the only characterized member, Geko, is involved in olfaction. The LSC might be functionally coupled to the correspondingly expanded olfactory receptor families.
Ubiquitin signaling/protein unfolding and degradation
 F-box proteins At(251, 64, 41, 23) Ce(111, 46, 21) Specificity-defining E3 subunits of ubiquitin ligases; fused to several other domains that might act as scaffolds for the assembly of the ubiquitinating enzyme complexes (Kipreos and Pagano 2000).
 RING-finger proteins At(74, 16, 12) The majority of the RING fingers in the LSCs are of the RING-H2 category; probably function as specific E3-ligases
 U-box proteins At(21, 18) RING-finger derivatives that probably mediate multiubiquitination of specific targets.
 Ubiquitin-domain proteins At(11) Probably utilized similarly to ubiquitin, but could specifically conjugate with different proteins.
 Adenoviral-type proteases At(117) Probably involved in deubiquitination as exemplified by ULP1/SMT4 (Li and Hochstrasser 2000; Nishida et al. 2000).
 GH3-domain proteins At(17) Share a conserved domain with the E1 subunits of ubiquitin ligases; might be negative regulators of the signalosome.
 MATH-domain proteins Ce(81) At(73) Related to the MATH domains of the ubiquitin carboxy-terminal hydrolases and E3-ligases of the TRAF family; could function as adaptors in ubiquitin pathways.
 Prolyl hydroxylases Dm(19) At(10) Hydroxylation of prolines by these enzymes might provide targets for ubiquitination by specific E3-ligases (Aravind and Koonin 2001).
 Cyclophilin-type peptidyl-prolyl  isomerases Dm(10) Catalyze isomerization of proline-containing peptide bonds; might function in regulating aggregation of protein complexes.
Chemoreceptors and small molecule sensors
 7-transmembrane olfactory  receptors Ce(264, 228, 122) Receptors for odorants/environmental chemicals (Dryer 2000; Glusman et al. 2001).
 Insect-type odorant receptors Dm(55) Receptors for odorants/environmental chemicals.
 Pheromone-binding proteins Dm(27) Probably involved in the binding and delivery of odorants to chemoreceptory cells.
 Patched-type sterol binding  membrane proteins Ce(15) Bind lipids and sterols in various contexts including stabilization of receptor complexes.
 Juvenile hormone and other  small-molecule-binding  proteins Dm(27) Probably involved in the binding and delivery of small molecules in the insect haemolymph.
 Lipid-bind proteins (NLTP) At(49, 26) Cysteine-rich α-helical proteins involved in lipid binding and delivery in various contexts and wax deposition.
 Jacalin-type lectins At(44) Might be involved in sugar binding and storage.
 Hemocyanins Dm(10) Copper-dependent oxygen transport proteins.
 Cyanin family proteins At(34) Copper-binding proteins.
Ion Channels and Transporters
 Degenerin family channels Dm(24) Sodium channels, probably function in tactile reception and related ion-dependent signaling pathways.
 Potassiumm channels Ce(15) Potassium channels of the double pore category, probably function as pH-dependent channels.
 Innexin-type channels Ce(20) Channels related to the Dm Shaking-B protein, might be involved in the formation of gap junctions.
 cNMP-gated channels At(21) Cyclic nucleotide-gated channels containing an intracellular cNMP-binding domain.
 Amino acid transporters At(33) Amino acid transporters of the N-amino acid transporter family.
 Potassium transporters At(17) Belong to the plant tiny root hairfamily; probably involved in potassium uptake.
 Na-P-transporter-related proteins Ce(26) Probably involved in phosphate uptake by symport.
 Hexose transporters Sc(15) Belong to the 12 TM sugar transporter superfamily.
 ABC transporters Dm(11, 9, 5) Transporters containing two ABC-class ATPase domains.
Small molecule metabolism
 Lipases At(106) A family of phospholipid lipases of the flavodoxin fold; involved in degradation of phosphatidylcholine. Could be involved in metabolizing lipids in germination or degrading lipid membranes of pathogens.
 2-OG-Fe dioxygenases At(67) Hydroxylases involved in the biosynthesis of numerous plant secondary metabolites, such as gibberellins (Aravind and Koonin 2001).
 NH2 cinnamoyl/benzoyltransferase At(56) Transfers aromatic carboxylic acid groups to diverse targets in the biosynthesis of plant secondary metabolites.
 Small molecule O-methylases At(38, 15) Catalyze the methylation step in the biosynthesis of diverse plant products, such as caffeic acid.
 Glutathione S-transferases At(14) Ce(28) Dm(27) Catalyze the conjugation of electrophilic substrates, particular xenobiotic, to glutathione as part of their transport and detoxification; additionally have peroxidase and small molecule isomerase activities.
 Predicted secreted small  molecule methylases Ce(32) Contain specific disulfide bonds; probably catalyze methylation of extracellular small molecules.
 Integral membrane   O-acyltransferases Ce(151) A family of membrane-associated acyltransferases closely related to the bacterial membrane associated acyltransferases that acylate macrolide antibiotics and cell surface polysaccharides.
 Predicted small molecule kinases Ce(23) Dm(45) Related to aminoglycoside and lipid kinases; probably involved in phosphorylation of small molecules, such as odorants and/or xenobiotics.
Structural/morphological proteins
 Cystine-rich expansions At(35) Plant cell-wall glycoproteins.
 Pectin methylesterases At(89) Involved in the biosynthesis of pectins, major structural components of plants.
 Pectin-associated proteins At(26) Four-cysteine α-helical domains, some fused to pectin esterases.
 Cuticular collagens Ce(34, 32, 26, 11) The principal structural component of the nematode cuticle (Johnstone 2000).
 Major sperm protein family Ce(32, 10) The principal structural component of nematode sperms.
 Insect cuticular proteins Dm(88) The principal structural component of the insect cuticle (Andersen et al. 1995).
 Peritrophin-like proteins Dm(40) Insect-specific extracellular matrix proteins.
 Cell wall glycoproteins Sc(11) Protein component of the yeast cell wall.
 Ecm34p-like proteins Sc(25) Protein component of the yeast cell wall.
  • The members of each LSC are listed in the Supplementary Material section, in which the LSCs can be identified by their names and the number of members.

  • Species abbreviations: (At) Arabidopsis thaliana;(Ce) Caenorhabditis elegans; (Dm) Drosophila melanogaster; (Sc) Saccharomyces cerevisiae; (Sp)Schizosaccharomyces pombe. The number of members in each LSC is indicated in parentheses; commas separate distinct LSCs that belong to the same class of paralogous proteins.

This Article

  1. Genome Res. 12: 1048-1059

Preprint Server