Genotype calling and haplotyping in parent-offspring trios

  1. Gonçalo R. Abecasis5,8
  1. 1Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15224, USA;
  2. 2Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, Pennsylvania 15224, USA;
  3. 3The Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, and Neurology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
  4. 4Istituto di Ricerca Genetica e Biomedica, Centro Nazionale di Ricerca (CNR), Monserrato, Cagliari 09042, Italy;
  5. 5Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48105, USA;
  6. 6Dipartimento di Scienze Biomediche, Università di Sassari, Sardinia 07100, Italy;
  7. 7Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA

    Abstract

    Emerging sequencing technologies allow common and rare variants to be systematically assayed across the human genome in many individuals. In order to improve variant detection and genotype calling, raw sequence data are typically examined across many individuals. Here, we describe a method for genotype calling in settings where sequence data are available for unrelated individuals and parent-offspring trios and show that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by family structure when assigning individual genotypes and haplotypes. Using simulations, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, trios provide greatly improved phasing accuracy—improving the accuracy of downstream analyses (such as genotype imputation) that rely on phased haplotypes. To further evaluate our approach, we analyzed data on the first 508 individuals sequenced by the SardiNIA sequencing project. Our results show that our method reduces the genotyping error rate by 50% compared with analysis using existing methods that ignore family structure. We anticipate our method will facilitate genotype calling and haplotype inference for many ongoing sequencing projects.

    Footnotes

    • 8 Corresponding authors

      E-mail wei.chen{at}chp.edu

      E-mail goncalo{at}umich.edu

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.142455.112.

      Freely available online through the Genome Research Open Access option.

    • Received April 30, 2012.
    • Accepted October 5, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    Articles citing this article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server