Detection of Deleted Genomic DNA Using a Semiautomated Computational Analysis of GeneChip Data

  1. Hugh Salamon1,2,3,
  2. Midori Kato-Maeda1,
  3. Peter M. Small1,
  4. Jorg Drenkow2, and
  5. Thomas R. Gingeras2
  1. 1Division of Infectious Diseases and Geographic Medicine, Department of Medicine, Stanford University, Stanford, California 94305, USA; 2Affymetrix, Inc., Santa Clara, California 95051, USA

Abstract

Genomic diversity within and between populations is caused by single nucleotide mutations, changes in repetitive DNA systems, recombination mechanisms, and insertion and deletion events. The contribution of these sources to diversity, whether purely genetic or of phenotypic consequence, can only be investigated if we have the means to quantitate and characterize diversity in many samples. With the advent of complete sequence characterization of representative genomes of different species, the possibility of developing protocols to screen for genetic polymorphism across entire genomes is actively being pursued. The large numbers of measurements such approaches yield demand that we pay careful attention to the numerical analysis of data. In this paper we present a novel application of an Affymetrix GeneChip to perform genome-wide screens for deletion polymorphism. A high-density oligonucleotide array formatted for mRNA expression and targeted at a fully sequenced 4.4-million–base pair Mycobacterium tuberculosis standard strain genome was adapted to compare genomic DNA. Hybridization intensities to 111,000 probe pairs (perfect complement and mismatch complement) were measured for genomic DNA from a clinical strain and from a vaccine organism. Because individual probe-pair hybridization intensities exhibit limited sensitivity/specificity characteristics to detect deletions, data-analytical methodology to exploit measurements from multiple probes in tandem locations across the genome was developed. The TSTEP (Tandem Set Terminal Extreme Probability) algorithm designed specifically to analyze the tandem hybridization measurements data was applied and shown to discover genomic deletions with high sensitivity. The TSTEP algorithm provides a foundation for similar efforts to characterize deletions in many hybridization measures in similar-sized and larger genomes. Issues relating to the design of genome content screening experiments and the implications of these methods for studying population genomics and the evolution of genomes are discussed.

Footnotes

  • 2 Present address: Department of Immunology, Berlex Biosciences, 15049 San Pablo Avenue, Richmond, CA 94804, USA

  • 3 Corresponding author.

  • E-MAIL Hugh_Salamon{at}Berlex.com; FAX (510) 669-4244.

  • Article published online before print: Genome Res.,10.1101/gr.152900.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.152900.

    • Received June 19, 2000.
    • Accepted September 18, 2000.
| Table of Contents

Preprint Server