Research

Estimating genotype error rates from high-coverage next-generation sequence data

    • 1Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California 94143, USA;
    • 2Institute for Human Genetics, University of California San Francisco, San Francisco, California 94143, USA;
    • 3Cardiovascular Research Institute, University of California San Francisco, San Francisco, California 94143, USA;
    • 4Kaiser Permanente Northern California Division of Research, Oakland, California 94612, USA
Published October 10, 2014. Vol 24 Issue 11, pp. 1734-1739. https://doi.org/10.1101/gr.168393.113
Download PDF Please log-in to or register for your personal account in order to access PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 4
Current Issue:

Abstract

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods.

Loading
Loading
Back to top