Method

An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data

    • 1 Fudan University;
    • 2 Baylor College of Medicine
Published January 7, 2013. https://doi.org/10.1101/gr.146084.112
Download PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 6
Current Issue:

Abstract

Next generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remains challenging. We describe herein, methods for high quality discovery, genotyping and phasing of SNPs for low coverage (~5X) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low coverage population sequencing: (1) Effective Base Depth (EBD), a non-parametric statistic which enables more accurate statistical modeling of sequencing data, (2) Variance Ratio Scoring, a variance based statistic that discovers polymorphic loci with high sensitivity and specificity and, (3) BAM -specific Binomial Mixture Modeling (BBMM), a clustering algorithm which generates robust genotype likelihoods from heterogeneous sequencing data. Lastly, we develop an imputation engine that refines raw genotype likelihoods to produce high quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing datasets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage dataset and obtain genotyping accuracy comparable to that of SNP microarray.

Loading
Loading
Loading
Back to top