Table 6.

Gene Clusters Deduced from the X-Matrix, for a Selected Set of Complexes/Functional Units

A. Percentage of each complex accumulated in each one of the nine clusters
Complexes Cluster No.No. of Prot. 1 51 2 55 3 58 4 12 5 17 6 6 7 15 8 9 9 54
PSI1233.338.338.3300008.3341.67
PSII1816.6705.5605.56000 72.22
ATPase 825000000075
Cytb6f 616.670000016.67066.67
NADHase110000 100 0000
Phyb 911.1111.11 77.78 000000
RibProt43 46.51 4.652.3302.33009.334.88
RNAApol 400000000100
CellDiv 520400 40 00000
HypoProt738.2230.1424.662.741.376.8517.811.376.85
B. Weight (in percentage) of each complex within each of the clusters
Complexes Cluster No.No. of Prot. 1 51 2 55 3 58 4 12 5 17 6 6 7 15 8 9 9 54
PSI127.841.821.72000011.119.26
PSII185.8801.7205.88000 24.07
ATPase 83.92000000011.11
Cytb6f 61.96000006.6707.41
NADHase110000 64.71 0000
Phyb 91.961.82 12.07 000000
RibProt43 39.22 3.641.7205.880044.4427.78
RNApol 4000000007.41
CellDiv 51.963.640 16.67 00000
HypoProt7311.764031.0316.675.8883.3386.6711.119.26
C. Recovery of original complexes in the clusters and Purity inside the clusters
Cluster No. Complexes −1n(P-value) >3 Recovery % Purity % HypoProt % Organisms best represented in each cluster
Synecho. Nongreen algae Red algae Green algae Land plants
1RiPr4.0946.5139.228.22××
3Phyb3.1277.7812.0724.66×
4CellDiv(2.9)4016.672.74×
5NADHase11.0110064.711.37××
9PSII4.7272.2224.076.85×××××
TotalAll clusters>373.0536.45

[i] Cluster analysis of genes as deduced from the scores matrix. The optimal number of clusters was found to be equal to nine. Tables include data about nine well-known chloroplast complexes (see Methods) and the hypothetical proteins. (A) Percentage of each complex accumulated in each one of the nine clusters obtained. (B) Percentage of weight of each complex within each one of the clusters. (C) The most relevant functional units as detected with the parameter of the statistical significance (P-value < 10−3). The P-value was derived assuming a background Poisson distribution (J.J. Lozano and A.R. Ortiz, in prep.). %R is the percentage of recovery of original complexes in the clusters. %P is the purity inside the clusters. %H is the percentage of functionally unknown proteins. Groups of genomes maximally represented in each cluster are marked by ×'s on the right of the table.