Prediction of the Archaeal Exosome and Its Connections with the Proteasome and the Translation and Transcription Machineries by a Comparative-Genomic Approach

Table 2.

Clusters of Orthologous Groups of Proteins (COGs) That Include Predicted Archaeal Exosome Subunits and Functionally Connected Proteins

COG (Predicted) function Sequence similarity between archaeal members (E-value range) Sequence similarity to the eukaryotic orthologs (E-value range) The closest archaeal paralog and sequence similarity (E-value range) Comments
1097 RNA-binding protein Rrp4p e-40–e-25 e-11–e-05 COG1096;  ∼e-03
0689 3′-5′ exonuclease, RNase PH homolog e80–e-60 e-28 COG2123;  e-11–e-09
2123 3′-5′ exonuclease, RNase PH homolog e-70–e-60 e-30 COG0689;  e-14–e-10
1603 Protein subunit of RNase P e-23–0.15 e-06–0.25 none The Crenarchaeal and eukaryotic proteins show limited similarity to the euryarchaeal orthologs; however, an iterative PSI-BLAST retrieves them from the database without false-positives and with high statistical significance.
1369 Protein subunit of RNase P e-13–e-04 ∼e-04 none
2136 IMP4, spliceosome subunit in eukaryotes, probably exosome subunit in archaea e-09–0.004 ∼e-07 none
1382 Prefoldin, co-translational chaperone e-26–e-15 ∼e-05 COG1730;  ∼0.002 Some spurious similarities to coiled-coil domains were also detected in database searches.
1325 Uncharacterized conserved protein e-23–e-09 none none
1500 Uncharacterized conserved protein e-72–e-46 ∼e-20 none
2892 Uncharacterized conserved protein e-07–e-05 none none A newly identified COG; most of the members have not been previously annotated as proteins (Fig.2A).
1096 RNA-binding protein Cs14p e-20–e-12 ∼e-04 COG1097;  ∼e-03
1487 Predicted RNA-binding protein, PIN-domain e-30–0.2 none COG1848;  >0.1 A complex COG with several paralogs in each archaeal species.
1753 Uncharacterized conserved protein e-04–e-03 none none Very distant similarity was detected between the members of this COGs and prefoldins; together with similar size and predicted α-helical structure, this might indicate a genuine evolutionary and functional relationship.
2386 Uncharacterized conserved protein e-09–e-04 none none
  • COGs that include well-characterized proteins such as proteasome subunits, predicted helicases, and methyltransferases are not included.

  • The E-values are for the database of proteins from complete genomes; e-n = 10−n.

This Article

  1. Genome Res. 11: 240-252

Preprint Server