Gene expression profiling of human breast tissue samples using SAGE-Seq
- Zhenhua Jeremy Wu1,2,
- Clifford A. Meyer1,2,9,
- Sibgat Choudhury3,4,5,9,
- Michail Shipitsin3,4,5,
- Reo Maruyama3,4,5,
- Marina Bessarabova6,
- Tatiana Nikolskaya6,
- Saraswati Sukumar7,
- Armin Schwartzman1,2,
- Jun S. Liu8,10,
- Kornelia Polyak3,4,5,10 and
- X. Shirley Liu1,2
- 1Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA;
- 2Harvard School of Public Health, Boston, Massachusetts 02115, USA;
- 3Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA;
- 4Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA;
- 5Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA;
- 6Vavilov Institute for General Genetics, Russian Academy of Sciences, Moscow 119331, Russia;
- 7Johns Hopkins Oncology Center, Baltimore, Maryland 21231, USA;
- 8Department of Statistics, Harvard University, Science Center 715, Cambridge, Massachusetts 02138, USA
-
↵9 These authors contributed equally to this work.
Abstract
We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein–coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.
Footnotes
-
↵10 Corresponding authors.
E-mail xsliu{at}jimmy.harvard.edu.
E-mail kornelia_polyak{at}dfci.harvard.edu.
-
[Supplemental material is available online at http://www.genome.org. The data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE24491. Software for SAGE-Seq data analysis is available at http://www.liulab.dfci.harvard.edu/sageExpress/.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.108217.110.
- Received April 1, 2010.
- Accepted September 24, 2010.
- Copyright © 2010 by Cold Spring Harbor Laboratory Press











