Estimating genotype error rates from high-coverage next-generation sequence data

Jeffrey D. Wall; Ling Fung Tang; Brandon Zerbe; Mark N. Kvale; Pui-Yan Kwok; Catherine Schaefer; Neil Risch

doi:10.1101/gr.168393.113

Research

Estimating genotype error rates from high-coverage next-generation sequence data

- ¹Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California 94143, USA;
- ²Institute for Human Genetics, University of California San Francisco, San Francisco, California 94143, USA;
- ³Cardiovascular Research Institute, University of California San Francisco, San Francisco, California 94143, USA;
- ⁴Kaiser Permanente Northern California Division of Research, Oakland, California 94612, USA

Published October 10, 2014. Vol 24 Issue 11, pp. 1734-1739. https://doi.org/10.1101/gr.168393.113

Cite Article Permissions

Current Issue:

April 2026, Vol. 36, No. 4

This article requires a subscription/paid access. Click here for options on how to access the full text.

Purchase short term access

Buy access to this article online for 24 hours. This includes access to:

The HTML version on the journal website, along with any supplementary material
A PDF version that can be downloaded for offline use during or after the access period

Access via an Institutional Subscription

You may already have access via your institution. Connect securely to your campus network or connect via an institutional VPN to see whether you have access.

Recommend this journal to your institution

If you do not have subscription access and would like to recommend this journal to your librarian, please use this online form.

Focus view

Abstract

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods.

Article contents

Article (Back to top)
- Abstract
- Results
- Discussion
- Methods
- Data access
- Acknowledgments
- Notes
- References

Research

Estimating genotype error rates from high-coverage next-generation sequence data

Cite this article

Share

Current Issue:

Purchase short term access

Access via an Institutional Subscription

Recommend this journal to your institution

Abstract

Article contents

Announcement(s)