A model-based approach to capture genetic variation for future association studies

  1. Susana Eyheramendy1,3,4,
  2. Jonathan Marchini1,
  3. Gilean McVean1,
  4. Simon Myers2, and
  5. Peter Donnelly1
  1. 1 Department of Statistics, University of Oxford, Oxford, OX1 3TG, United Kingdom;
  2. 2 Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02139, USA

    Abstract

    Genome-wide association studies are still constrained by the cost of genotyping. For this reason, the selection of a reduced set of markers or tags able to capture a significant proportion of the genetic variation is an important aspect of these studies. Most tagging SNP selection methods have been successful in capturing the genetic variation of the data from which the tags have been chosen. However, when these tags are used in an independent data set, a significant proportion of the remaining SNPs (non-tags) are not captured and, in most cases, there is no information on which SNPs are captured. We propose to use a probabilistic model to predict the non-tags based on a set of tags, as a way to capture genetic variation. An important advantage of this method is that it directly predicts the genotype of the non-tags with which we can test for association with the phenotype and which could help to elucidate the location of genes responsible for increasing disease susceptibility. Additionally, this method provides an estimate of the probabilities with which the predictions are made, which reflects the confidence of the probabilistic model. We also propose new methods to select the tagging SNPs. We empirically show by using HapMap data that our approach is able to capture significantly more genetic variation than methods based solely on a pairwise LD measure.

    Footnotes

    • 3 Present address: Department of Statistics, Ludwig-Maximilans-Universität München, Ludwigstrasse 33, 80539 Munich, Germany

    • 4 Corresponding author.

      4 E-mail eyheram{at}stat.uni-muenchen.de; fax 49-89-21805041.

    • [Supplemental material is available online at www.genome.org.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5675406

      • Received June 21, 2006.
      • Accepted August 31, 2006.

    Preprint Server