Discriminating somatic and germline mutations in tumour DNA samples without matching normals

  1. Andrew Stubbs
  1. Erasmus MC
  1. * Corresponding author; email: s.hiltemann{at}erasmusmc.nl

Abstract

Tumour analyses commonly employ a correction with a matched normal (MN), a sample from healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. Since the majority of variants found in an individual are thought to be common within the population, we constructed a set of 931 samples from healthy, unrelated individuals, originating from two different sequencing platforms, to serve as a virtual normal (VN) in the absence of such an associated normal sample. Our approach removed over 96% of the germline variants also removed by the matched normal sample, and a large number (2-8%) of additional variants not corrected for by the associated normal. The combination of the VN with the matched normal improved the correction for polymorphisms significantly with up to ~30% as compared to matched normal and ~15% as compared to VN only. We determined the number unrelated genomes needed in order to correct at least as efficiently as the matched normal is ~200 for SVs and ~400 for SNVs and indels. In addition, we propose that the removal of common variants with purely position-based methods is inaccurate and incurs additional false positive somatic variants, and that more sophisticated algorithms which are capable of leveraging information about the area surrounding variants are needed for optimal accuracy. Our VN correction method can be used to analyse any list of variants, regardless of sequencing platform of origin. Somatic variants identified by our method are annotated with a confidence score derived from considering the ratio of full calls vs half-calls and no-calls at the locus across the CG-sequenced VN samples. This VN methodology is available for use on our public Galaxy server.

  • Received August 15, 2014.
  • Accepted July 8, 2015.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Articles citing this article

ACCEPTED MANUSCRIPT

Preprint Server