Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome
- Kourosh Salehi-Ashtiani1,4,5,
- Chenwei Lin1,4,
- Tong Hao1,4,
- Yun Shen1,
- David Szeto1,
- Xinping Yang1,
- Lila Ghamsari1,
- HanJoo Lee1,
- Changyu Fan1,
- Ryan R. Murray1,
- Stuart Milstein1,3,
- Nenad Svrzikapa1,3,
- Michael E. Cusick1,
- Frederick P. Roth2,
- David E. Hill1 and
- Marc Vidal1,5
- 1 Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
- 2 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
-
↵4 These authors contributed equally to this work.
Abstract
Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.
Footnotes
-
↵5 Corresponding authors.
E-mail kourosh_salehi-ashtiani{at}dfci.harvard.edu; fax (617) 632-5739.
E-mail marc_vidal{at}dfci.harvard.edu; fax (617) 632-5739.
-
[Supplemental material is available online at http://www.genome.org. 5′- and 3′-RACE sequences are available at http://www.wormbase.org and http://worfdb.dfci.harvard.edu/index.php?page=race.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.098640.109.
-
- Received July 19, 2009.
- Accepted September 22, 2009.
- Copyright © 2009 by Cold Spring Harbor Laboratory Press











