
Protein domain analysis. The 43,141 SASs were translated using the ESTScan algorithm, and the resulting 40,821 amino acid sequences were entered as queries in the Pfam database using the default settings of Pfam 7.0 [“global and local alignments merged” and “Pfam gathering threshold (GA)”]. A total of 12,921 SAS putative proteins produced significant matches with 1415 protein domain families of the Pfam database. (A) Number of distinct domains found for each SAS protein. The number of SAS proteins that contained one, two, three, four, or five distinct domains is shown. (B) Maximum number of repetitions for the top 14 repeated domains: nucleoporin FG (A), LRR (B), HEAT (C), M (D), PPR (E), TPR (F), XYPPX (G), WD40 (H), PC rep (I), ank (J), MORN (K), armadillo seq (L), PUF (M), and AT hook (N). The domains most often repeated in the same protein are shown along with the maximum number of repeats found for each domain. (C) Range of repetitions found for the LRR, PPR, TPR, WD40, rrm, and EF-hand domains. The domains with the most varied number of occurrences per SAS protein are indicated, along with the number of SAS proteins for each number of repeats.











