Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes

Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mutation levels in sperm over time. This process, termed “selfish spermatogonial selection,” explains the high spontaneous birth prevalence and strong paternal age-effect of disorders such as achondroplasia and Apert, Noonan and Costello syndromes, with direct experimental evidence currently available for specific positions of six genes (FGFR2, FGFR3, RET, PTPN11, HRAS, and KRAS). We present a discovery screen to identify novel mutations and genes showing evidence of positive selection in the male germline, by performing massively parallel simplex PCR using RainDance technology to interrogate mutational hotspots in 67 genes (51.5 kb in total) in 276 biopsies of testes from five men (median age, 83 yr). Following ultradeep sequencing (about 16,000×), development of a low-frequency variant prioritization strategy, and targeted validation, we identified 61 distinct variants present at frequencies as low as 0.06%, including 54 variants not previously directly associated with selfish selection. The majority (80%) of variants identified have previously been implicated in developmental disorders and/or oncogenesis and include mutations in six newly associated genes (BRAF, CBL, MAP2K1, MAP2K2, RAF1, and SOS1), all of which encode components of the RAS-MAPK pathway and activate signaling. Our findings extend the link between mutations dysregulating the RAS-MAPK pathway and selfish selection, and show that the aging male germline is a repository for such deleterious mutations.

Moreover, ~10% of our screening panel (Supplemental Table S7) was designed to target regions in 'neutral-test' (negative control) genes that were not anticipated to be subject to selfish selection (i.e. none of these 10 genes are considered to be cancer genes and if known as disease genes, have been associated with recessive and/or familial disorders). As shown on Supplemental Table S3, a total of 9 variants in the 'neutral-test' set were long-listed following our filtering strategy, the majority of which were in Tier 4. The only two Tier 2 variants in the neutral-test set (GAPDH, chr12:6646840A>G and RHO chr3:129251230T>C) were further re-screened by direct PCR amplification and deep-sequencing and shown to be false positive calls.
Hence among the 'neutral-test' gene set and intronic sequences, no variants validated upon re-screening, providing further support that our prioritization pipeline is able to differentiate true positive variants from technical artefacts.

Further comments on the AKT3 (#1), APC (#2), LRP5 (#47) variants
Among the 61 validated variants, 3 variants (AKT3 (#1), APC (#2), LRP5 (#47)) do not belong to the RAS-MAPK gene category: Unlike the 59 other variants, the LRP5 and AKT3 variants encode synonymous substitutions that are likely to be functionally neutral. The LRP5 c.291C>T (p.Ala97Ala) variant was present at relatively high VAF (0.53-1.20%) but as this change was called in four biopsies in Tes4 that were also positive for the driver KRAS c.182A>G variant (p.Gln61Arg -oncogenic) (Supplemental Figs S2 and S4), we suggest that it may represent a passenger mutation tracking the KRAS clone. Finally, the synonymous AKT3 (p.Ser472Ser) variant which occurs at a CpG dinucleotide has previously been reported in multiple populations (gnomAD, MAF = 0.049%, including 1.1% in African population) is likely to be neutral. Hence, like LRP5, this substitution that was identified in a single biopsy which also carried 2 other selfish variants (PTPN11/SHP2 p.Phe71Leu and BRAF p.Gln709Lys) may represent a passenger call tracking a selfish event.
While the APC c.4471T>G (p.Phe1491Val) variant which was identified in a single biopsy at a VAF of 0.47% may be functional (CADD score = 26.2), its significance in the absence of other variants in this gene remains unclear. Hence, although dominant germline mutations in this tumour suppressor, which controls ß-catenin turnover and affect the Wnt pathway, account for ~85% of cases of familial adenomatous polyposis (FAP), a cancer predisposition syndrome, this isolated result will warrant further investigation.  Table S5). Each testis was cut into slices ~3-5 mm thick and either stored frozen at -80°C or formalin-fixed. After thawing slices of frozen testis, extraneous tissue (epididymis or tunica albuginea) was removed and slices were further dissected into 21-36 biopsies (Supplemental Table S5). Biopsies were pulverized using a pestle and DNA extraction was performed using the Qiagen DNeasy Blood & Tissue Kit.

RainDance library preparation and sequencing
Primer pairs (tailed with common RainDance sequences (RD)) targeting 500 genomic regions  Fig S1).
Given that for each sample, the average material input was ~1000 haploid genomes, the detection limit of the assay is anticipated to be ~0. custom sequencing primers generating 14-20 × 10 7 paired-end reads per library. All primer sequences are given in Supplemental Table S6.

Sequence alignment, variant calling and prioritization
Low quality reads with more than 20 bases below Q20, read pairs with one or two short (<50 bp) reads and reads pairs with unmatched or mismatched sequences between the forward and reverse primer pairs expected for each amplicon were removed. Reads passing QC (on Supplemental Methods -28 average 86% of reads) were aligned to the human genome (hg19) using BWA-MEM version 0.7.10 (Li 2013) with default parameter settings. Primer sequences were included in the alignment but ignored during variant quantification. The Python library Pysam (https://github.com/pysam-developers/pysam) was used to fetch reads mapped to each amplicon and mapped bases (indicated as letter "M") were identified from the CIGAR string.
Pileup was then performed for each amplicon independently. Nine amplicons that did not map to the targeted genomic regions were excluded from subsequent analyses (Supplemental Table S2). After trimming, reads with more than 10 non-reference bases were Notably, the majority of these were C>A (=G>T) variant calls (Supplemental Fig S6), which represent a known mutational signature associated with oxidative stress that likely arose during sample preparation (Arbeithuber et al. 2016;Chen et al. 2017 Table S3). Barcoded PCR and smMIP products were purified with AxyPrep magnetic beads (Axygen).
Demultiplexed reads were aligned to the human genome (hg19) using BWA-MEM version 0.7.12 (Li 2013). Summary tables of the calls across the aligned target region for PCR and were generated using SAMtools mpileup. A base call was only considered if its mapping quality was ≥Q20 and phred score ≥Q30. Pileups for smMIP-sequenced samples were obtained using a custom pipeline: UMIs were extracted from reads and grouped using UMI-tools (Smith et al. 2017). Reads were further trimmed to remove primers, assigned to the probe they were amplified from, and aligned to GRCh37 (without alt contigs) using bwa mem version 0.7.12 (Li 2013 observed base call quality was less than Q30 were also removed. This entire pipeline, built using Snakemake version 3.11.2 (Koster and Rahmann 2012), will be available from https://github.com/koelling/amplimap/. Validated variants were annotated according to the

Immunohistochemistry, microdissection and targeted mutation screen
Where mutations had been identified in frozen sections for which an adjacent FFPE tissue block was available, we attempted to visualize the corresponding mutant clone in sections of the FFPE block. Immunohistochemical staining with anti-MAGEA4 antibody (clone 57B, gifted by Prof. Giulio C. Spagnoli) to identify tubules with enhanced spermatogonial MAGEA4 staining, followed by laser capture microdissection and DNA extraction of adjacent FFPE sections, was performed as described (Maher et al. 2016). DNA was subsequently amplified by PCR (40 cycles) using CS-tagged primers (Supplemental Table S6) and barcoded for Illumina MiSeq 300v2 sequencing as described above. DNA samples extracted from the whole tissue section and from adjacent tubules with a normal MAGEA4 staining appearance were used as controls. Reads were aligned to the human genome (hg19) using BWA-MEM version 0.7.12 (Li 2013) and were visualized in IGV.

Analysis of variant enrichment
In order to test for enrichment of variants in the RTK-RAS-MAPK pathway, we performed a Fisher's exact test, categorizing each genomic position tested as to whether it was part of a  gene in the RTK-RAS-MAPK pathway category (or not), and whether we found a validated variant at the genomic position in the final validated dataset or not (Supplemental Table S7).
To further evaluate the contribution of different variables to this observed enrichment, we conducted a logistic regression analysis. As predictors we used whether a variant belongs to a gene in the RAS-MAPK pathway (or not), the individual testis donors identity (ind), the sequencing libraries (lib 1-16), and types of substitutions (Mutability). The response was whether the variant was part of the final validated dataset or not.
The measurements of the individual and sequencing library variables are obtained by summing the non-consensus read counts (NCRs) across individuals and libraries respectively.
The Mutability is obtained by summing the NCRs across the entire data set. We observe significant positive coefficient for the RAS-MAPK pathway variable (P = 6.36 × 10 -15 ), suggesting that the significant enrichment observed previously with the Fisher's exact test is not an artifact of mutability or sequencing libraries/individual effects. The regression coefficients are shown in Supplemental Table S8.