Table 2.

Clusters of Orthologous Groups of Proteins (COGs) That Include Predicted Archaeal Exosome Subunits and Functionally Connected Proteins[i]

COG (Predicted) function Sequence similarity between archaeal members (E-value range)[ii] Sequence similarity to the eukaryotic orthologs (E-value range) The closest archaeal paralog and sequence similarity (E-value range) Comments
1097RNA-binding protein Rrp4pe-40–e-25e-11–e-05COG1096;  ∼e-03
06893′-5′ exonuclease, RNase PH homologe80–e-60e-28COG2123;  e-11–e-09
21233′-5′ exonuclease, RNase PH homologe-70–e-60e-30COG0689;  e-14–e-10
1603Protein subunit of RNase Pe-23–0.15e-06–0.25noneThe Crenarchaeal and eukaryotic proteins show limited similarity to the euryarchaeal orthologs; however, an iterative PSI-BLAST retrieves them from the database without false-positives and with high statistical significance.
1369Protein subunit of RNase Pe-13–e-04∼e-04none
2136IMP4, spliceosome subunit in eukaryotes, probably exosome subunit in archaeae-09–0.004∼e-07none
1382Prefoldin, co-translational chaperonee-26–e-15∼e-05COG1730;  ∼0.002Some spurious similarities to coiled-coil domains were also detected in database searches.
1325Uncharacterized conserved proteine-23–e-09nonenone
1500Uncharacterized conserved proteine-72–e-46∼e-20none
2892Uncharacterized conserved proteine-07–e-05nonenoneA newly identified COG; most of the members have not been previously annotated as proteins (Fig.2A).
1096RNA-binding protein Cs14pe-20–e-12∼e-04COG1097;  ∼e-03
1487Predicted RNA-binding protein, PIN-domaine-30–0.2noneCOG1848;  >0.1A complex COG with several paralogs in each archaeal species.
1753Uncharacterized conserved proteine-04–e-03nonenoneVery distant similarity was detected between the members of this COGs and prefoldins; together with similar size and predicted α-helical structure, this might indicate a genuine evolutionary and functional relationship.
2386Uncharacterized conserved proteine-09–e-04nonenone

[i] COGs that include well-characterized proteins such as proteasome subunits, predicted helicases, and methyltransferases are not included.

[ii] The E-values are for the database of proteins from complete genomes; e-n = 10−n.