Detecting genetic variation in microarray expression data

  1. Jennifer A. Greenhall1,2,8,
  2. Matthew A. Zapala3,4,8,
  3. Mario Cáceres1,5,
  4. Ondrej Libiger4,
  5. Carrolee Barlow1,6,
  6. Nicholas J. Schork3,4, and
  7. David J. Lockhart1,7,9
  1. 1 The Salk Institute for Biological Studies, La Jolla, California 92037, USA;
  2. 2 Neurosciences Graduate Program, School of Medicine, University of California, San Diego, California 92093, USA;
  3. 3 Biomedical Sciences Graduate Program, School of Medicine, University of California, San Diego, California 92093, USA;
  4. 4 Polymorphism Research Laboratory, Department of Psychiatry, University of California, San Diego, California 92093, USA;
  5. 5 Genes and Disease Program, Center for Genomic Regulation (CRG-UPF), Barcelona 08003, Spain;
  6. 6 Brain Cells, Inc., San Diego, California 92121, USA;
  7. 7 Amicus Therapeutics, Cranbury, New Jersey 08512, USA
  1. 8 These authors contributed equally to this work.

Abstract

The use of high-density oligonucleotide arrays to measure the expression levels of thousands of genes in parallel has become commonplace. To take further advantage of the growing body of data, we developed a method, termed “GeSNP,” to mine the detailed hybridization patterns in oligonucleotide array expression data for evidence of genetic variation. To demonstrate the performance of the algorithm, the hybridization patterns in data obtained previously from SAMP8/Ta, SAMP10/Ta, and SAMR1/Ta inbred mice and from humans and chimpanzees were analyzed. Genes with consistent strain-specific and species-specific hybridization pattern differences were identified, and ∼90% of the candidate genes were independently confirmed to harbor sequence differences. Importantly, the quality of gene expression data was also improved by masking the probes of regions with putative sequence differences between species and strains. To illustrate the application to human disease groups, data from an inflammatory bowel disease study were analyzed. GeSNP identified sequence differences in candidate genes previously discovered in independent association and linkage studies and uncovered many promising new candidates. This approach enables the opportunistic extraction of genetic variation information from new or pre-existing gene expression data obtained with high-density oligonucleotide arrays.

Footnotes

  • 9 Corresponding author.

    9 E-mail dlockhart{at}amicustherapeutics.com; fax (609) 662-2001.

  • [Supplemental material is available online at www.genome.org. GeSNP can be accessed at http://porifera.ucsd.edu/~cabney/cgi-bin/geSNP.cgi. The Affymetrix CEL files for the mouse studies and the human/chimpanzee array data have been submitted to GEO under accession nos. GSE6238 and GSE7540, respectively.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6307307

    • Received January 23, 2007.
    • Accepted April 23, 2007.
| Table of Contents

Preprint Server