Small open reading frames: Not so small anymore
Today, nearly 10 years after the publication of the complete sequence of the Saccharomyces cerevisiae genome, the total number of genes in this organism is largely considered resolved and currently stands at 5782 (http://www.yeastgenome.org/cache/genomeSnapshot.html). This gene sequence information has led to construction of numerous yeast strain and plasmid collections, including the two-hybrid (Ito et al. 2000, 2001; Uetz et al. 2000) viable haploid deletion (Winzeler et al. 1999; Giaever et al. 2002), titratable promoter allele (Mnaimneh et al. 2004), and chromosomally tagged green fluorescent protein and TAP fusion libraries (Ghaemmaghami et al. 2003; Huh et al. 2003). Because creation of these collections has relied on the accuracy of gene annotation at the time of library construction, each library has limitations. One major issue for most is the fact that initial annotation of the S. cerevisiae genome included only those regions consisting of at least 100 contiguous codons, and therefore small open reading frames (sORFs) encoding functional proteins were largely missed (Goffeau et al. 1996) and only considered later following detection of expression (Olivas et al. 1997; Velculescu et al. 1997; Kumar et al. 2002; Oshiro et al. 2002; Kessler et al. 2003). As such, the phenotypic consequences of gene disruption of more than half of these sORFs has not been assessed, and sORFs are underrepresented in genomic libraries and other collections.
In this issue of Genome Research, Kastenmayer et al. (2006) provide the first systematic analysis of the prevalence of sORFs in a eukaryotic genome. Of the 299 currently recognized sORFs in the S. cerevisiae genome, they discovered that more than half (170) have been annotated since the genome was sequenced. Of these, 12% were identified by homology, and an even greater fraction (74%) was identified by combining both homology and empirical evidence such as detection of a transcribed and translated product. This is extremely significant given the variability in gene models that makes genome comparisons among species difficult and comparison between genera even more complex. The comparison of sORFs is additionally complicated given that a change in a single amino acid can create a large bias when assessing percent conservation. Also, prokaryotic genome sequence comparisons suggest that sORFs may be particularly susceptible to over-annotation (Nielsen and Krogh 2005). The significant apparent conservation of sORFs inspired Kastenmayer et al. to extend their analysis to include the entire list (299) of currently annotated sORFs. Using BLAST and analysis of the HomoloGene database, Kastenmayer et al. established that 184 S. cerevisiae sORFs have potential orthologs in other organisms and likely encode bona fide proteins with conserved functions. Like other large-scale functional genomic studies, this sORF survey affirms the importance of continued confirmation and refinement of genome annotation (for recent review, see Dolinksi and Botstein 2005).
To initiate characterization of those 170 sORFs discovered since the completion of the sequencing of the S. cerevisiae genome, Kastenmayer et al. (2006) constructed 140 deletion mutants, bringing the total number of sORF deletion strains to 247 (the rest were previously constructed by the deletion consortium, Winzeler et al. 1999). Examination of these newly constructed strains in an array of conditions uncovered growth phenotypes for 22 mutants, some of which were overlapping. For example, three new sORFs are essential, six mutants had a significant growth defect, and several exhibited sensitivities to a range of cell stressors, including DNA-damaging agents and suboptimal growth conditions. What other information about sORF function might we expect to extract from experiments with the new sORF deletion collection? The standard yeast deletion mutant collection enabled a paradigm shift in yeast genetics—the collection of deletion mutants has been systematically scrutinized for a variety of phenotypes, providing rosters of genes that may contribute to a particular biological process (Scherens and Goffeau 2004). For example, the cell morphology of each mutant strain in the deletion collection has been examined by microscopy, providing the first comprehensive list of morphogenetic defects for any organism (Giaever et al. 2002; Saito et al. 2005). Also, the deletion mutants were designed with molecular “barcodes”—unique DNA sequences that identify each mutant strain—enabling parallel phenotypic analysis in pools of deletion mutants. The fitness contribution of individual strains can then be quantitatively assessed using a barcode oligonucleotide microarray readout (Giaever et al. 2002). The sORF collection has been similarly barcoded, and these mutant strains can now be included in all future functional profiling experiments.
Twenty-one of the 247 sORFs (8%) for which deletion mutants exist are essential for haploid viability (including three new sORFs uncovered from the Kastenmayer study). This is much lower than the incidence of essentiality in the entire genome (22%), suggesting (1) condition-dependent essentiality for these sORFs or (2) much genetic redundancy involving sORFs. Given that the sORF collection is arrayed and compatible with methods developed for the larger mutant collection, genetic redundancy can be tackled using methods of automated yeast genetics that have been recently invented (Tong et al. 2004). In particular, the so-called Synthetic Genetic Array or SGA method can be applied to systematically construct sORF double mutants, allowing assessment of genetic background in which sORFs are required for viability. A synthetic lethal interaction network has already been constructed by screening for genetic interactions involving well over 100 mutant strains and the ∼5000 strains in the original deletion mutant collection (Tong et al. 2004). This project revealed that, on average, each gene will have about 26 synthetic lethal interactions. Using this average as a guide, we might expect sORFs to contribute 7774 (26 × 299) new genetic interactions to the S. cerevisiae genetic interaction network (Tong et al. 2004). Will sORFs have more or fewer than the average 26 genetic interactions? To date, only 28 sORFs have been reported to participate in a genetic interaction (Saccharomyces Genome Database; http://www.yeastgenome.org/). Even from the extensive genetic interaction survey of the haploid deletion collection, only 16 sORFs (of the 107 sORFs represented in this version of the collection) were found to genetically interact with any of 132 query genes used in the large genetic network project (Tong et al. 2004). Since the query mutants were biased toward genes with roles in actin-based cell polarity, cell wall biosynthesis, microtubule-based chromosome segregation, and DNA synthesis and repair, this may suggest alternative functions for most sORFs. Because the range in the number of genetic interactions identified for each of these 16 sORFs was quite large (between 1 and 18), we expect that only by screening individual sORF deletion mutants will we determine the exact number of genetic interactions per sORF.
Like genetic interactions, relatively few physical interactions with sORFs have been described. According to the Saccharomyces Genome Database, only 75 sORFs are reported to participate in a physical interaction with one or more other proteins. Intuitively we might expect smaller proteins possessing fewer protein domains to have fewer physical interactions. There exists no correlation, though, between the size of these sORFs and their number of reported physical interactions. Consistent with this, the smallest characterized ORFs in the S. cerevisiae genome encode 25 amino acid ribosomal proteins that spend their functional life in large protein complexes (Saccharomyces Genome Database; http://www.yeastgenome.org/). Future efforts to include sORFs in affinity purification and other proteomic experiments should help determine their full spectrum of physical interactions. These data will also help immeasurably in assignation of function to sORFs.
Might the characterization of sORFs be plagued by specific problems? We expect a similar spectrum of obstacles as encountered for larger ORFs, such as variability in RNA and protein isolation. However, at this stage, our limited examination of sORFs suggests that it is unlikely that this group of genes will exhibit unique group-specific properties with respect to their levels of expression or the abundance of their protein products. For example, a view of sORF protein abundance has come from quantification of natively expressed TAP-tagged sORFs (Ghaemmaghami et al. 2003). From this study it is clear that sORF expression levels vary substantially; of the ∼200 C-terminally tagged sORFs, only ∼100 were detectable, and of these, protein abundance varied from 99 to 1,590,000 molecules per cell, a range comparable to that seen for larger ORFs. As for other genes, functional analysis of each sORF will require its own specific investigation emphasizing the importance of creating the collection of unique sORF deletion mutants and of including these genes in various libraries for detailed study.
The prevalence of sORFs in eukaryotic genomes is only now becoming apparent. The high degree of conservation of sORFs and their prevalence in the S. cerevisiae genome (5% of all annotated genes) suggests that the small proteins that they encode are likely significant contributors to cell biology. Consistent with this are established roles for sORFs in a range of processes including transport, transcription, protein synthesis, DNA metabolism, and protein folding, among others. Interestingly, Kastenmayer et al. found that higher eukaryotes (including Homo sapiens) appear to have a similar proportion of sORFs to S. cerevisiae. That is, ∼5% of annotated ORFs are smaller than 100 amino acids regardless of the size of the genome. By providing a first glimpse of the prevalence of sORFs in the eukaryotic genome, Kastenmayer et al. (2006) have revealed the extent to which sORFs may have been overlooked in our quest to systematically define gene function. This class of genes may be small in size, but not in biological significance.
Footnotes
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4976706.
-
↵3 Corresponding author. E-mail Brenda.andrews{at}utoronto.ca; fax (416) 946-8253.
- Cold Spring Harbor Laboratory Press











