Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers

Throughout their lifetime, cells are subject to extrinsic and intrinsic mutational processes leaving behind characteristic signatures in the genome. DNA mismatch repair (MMR) deficiency leads to hypermutation and is found in different cancer types. Although it is possible to associate mutational signatures extracted from human cancers with possible mutational processes, the exact causation is often unknown. Here, we use C. elegans genome sequencing of pms-2 and mlh-1 knockouts to reveal the mutational patterns linked to C. elegans MMR deficiency and their dependency on endogenous replication errors and errors caused by deletion of the polymerase ε subunit pole-4. Signature extraction from 215 human colorectal and 289 gastric adenocarcinomas revealed three MMR-associated signatures, one of which closely resembles the C. elegans MMR spectrum and strongly discriminates microsatellite stable and unstable tumors (AUC = 98%). A characteristic difference between human and C. elegans MMR deficiency is the lack of elevated levels of NCG > NTG mutations in C. elegans, likely caused by the absence of cytosine (CpG) methylation in worms. The other two human MMR signatures may reflect the interaction between MMR deficiency and other mutagenic processes, but their exact cause remains unknown. In summary, combining information from genetically defined models and cancer samples allows for better aligning mutational signatures to causal mutagenic processes.


INTRODUCTION
Cancer is a genetic disease associated with the accumulation of mutations. A major challenge is to understand mutagenic processes acting in cancer cells. Accurate DNA replication and the repair of DNA damage are important for genome maintenance. The identification of cancer predisposition syndromes caused by defects in DNA repair genes was important to link the etiology of cancer to increased mutagenesis. One of the first DNA repair pathways associated with cancer predisposition was DNA mismatch repair (MMR). MMR corrects mistakes that arise during DNA replication. Mutations in MMR genes are associated with hereditary non-polyposis colorectal cancer (HNPCC), also referred to as Lynch Syndrome (Fishel et al. 1993;Bronner et al. 1994;Nicolaides et al. 1994;Papadopoulos et al. 1994;Miyaki et al. 1997).
DNA mismatch repair is initiated by the recognition of replication errors by MutS proteins, initially defined in bacteria. In S. cerevisiae and mammalian cells, two MutS complexes termed MutSα and MutSβ comprised of MSH2/MSH6 and MSH2/MSH3 heterodimers, respectively, are required for DNA damage recognition albeit with differing substrate specificity (Drummond et al. 1995;Habraken et al. 1996;Genschel et al. 1998). Binding of MutS to the DNA lesion facilitates subsequent recruitment of the MutL complex. MutL enhances mismatch recognition and promotes a conformational change in MutS through ATP hydrolysis to allow for the sliding of the MutL/MutS complex away from mismatched DNA (Allen et al. 1997;Gradia et al. 1999). DNA repair is initiated in most systems by a single-stranded nick generated by MutL (MutH in E. coli) on the nascent DNA strand at some distance to the lesion (Kadyrov et al. 2006;Kadyrov et al. 2007). Exonucleolytic activities in part conferred by Exo1 contribute to the removal of the DNA stretch containing the mismatch followed by gap filling via lagging strand DNA synthesis (Goellner et al. 2015). The most prominent MutL activity in human cells is provided by the MutLα heterodimer MLH1/PMS2 (Prolla et al. 1998;Cannavo et al. 2005). Moreover, human MLH1 is found in heterodimers with PMS1 and MLH3, called MutLβ and MutLγ. Of these only MutLγ is thought to have a minor role in MMR (Cannavo et al. 2005). The C. elegans genome does not encode obvious MutLβ and γ subunits (PMS1 and MLH3 homologues, respectively), while the MutLα subunits MLH-1 and PMS-2 can be readily identified using homology searches (Supplemental Table S1).
Analysis of mutations in microsatellite loci of MLH1-deficient colorectal cancer cell lines suggested rates of repeat expansion or contraction between 8.4x 10 -3 to 3.8x 10 -2 per locus and generation (Bhattacharyya et al. 1994;Hanford et al. 1998). Estimates using S. cerevisiae revealed a 100-to 700-fold increase in DNA repeat tract instability in pms2, mlh1 and msh2 mutants (Strand et al. 1993) and a ~5-fold increase in base substitution rates (Yang et al. 1999). C. elegans assays using reporter systems or selected, PCR-amplified regions revealed a more than 30-fold increased frequency of single base substitutions in msh-6, a 500-fold increase in mutations in A/T homopolymer runs and a 100-fold increase in mutations in dinucleotide repeats (Degtyareva et al. 2002;Tijsterman et al. 2002;Denver et al. 2005), akin to the frequencies observed in yeast and mammalian cells (Strand et al. 1993;Hanford et al. 1998). Recently, whole genome sequencing approaches using diploid S. cerevisiae started to provide a genomewide view of MMR deficiency. S. cerevisiae lines carrying an msh2 deletion alone or in conjunction with point mutations in one of the three replicative polymerases, Polα/primase, Polδ, and Polε, were propagated over multiple generations to determine the individual contribution of replicative polymerases and MMR to replication fidelity (Lang et al. 2013;Lujan et al. 2014;Lujan et al. 2015). These analyses estimated an average base substitution rate of 1.6 x 10 -8 per base pair per generation in msh2 mutants and a further increased rate in double mutants of msh2 and any of the replicative polymerases (Lujan et al. 2014;Lujan et al. 2015). A synergistic increase in mutagenesis was also recently observed in childhood tumors in which MMR deficiency and mutations in replicative polymerase ε and δ, required for leading and lagging strand DNA synthesis respectively, occurred (Shlien et al. 2015).
In human cancer samples 30 mutational signatures (referred to as COSMIC signatures from here on) have been uncovered by mathematical modeling across a large number of cancer genomes representing more than 30 tumor types (http://cancer.sanger.ac.uk/cosmic/signatures) (Alexandrov et al. 2013a;Alexandrov et al. 2013b). These signatures are largely defined by the relative frequency of the six possible base substitutions (C>A, C>G, C>T, T>A, T>C, T>G) in the sequence context of their adjacent 5' and 3' base. Of these, COSMIC signatures 6, 15, 20, 21 and 26, have been associated with MMR deficiency with several MMR signatures being present in the same tumor sample (Alexandrov et al. 2013a;Alexandrov et al. 2013b). It is not clear if these MMR signatures are conserved across evolution and how they reflect MMR defects. Therefore, MMR signatures deduced from defined monogenic MMR defective backgrounds (which we will herein refer to as mutational patterns) could contribute to the refinement of computationally derived mutational signatures extracted from cancer genomes.
Here we investigate the genome-wide mutational impact of the loss of the MutL mismatch repair genes mlh-1 and pms-2 in the nematode C. elegans. Furthermore, we address the contribution of a deletion of pole-4, a non-essential accessory subunit of the leading-strand DNA polymerase Polε, to mutation profiles and hypermutation.

Mutation rates and profiles of mlh-1, pms-2 and pole-4 single mutants grown over 20 generations
We previously established C. elegans mutation accumulation assays and demonstrated that defects in major DNA damage response and DNA repair pathways, including nucleotide excision repair, base excision repair, DNA crosslink repair, DNA end-joining and apoptosis did not lead to overtly increased mutation rates when lines were propagated for 20 generations (Meier et al. 2014). The experimental setup takes advantage of the 3-4 days life cycle of C. elegans and its hermaphroditic reproduction by self-fertilization. This allows for the propagation of clonal C. elegans lines, which in each generation pass through a single cell bottleneck provided by the zygote. We now extend these studies to MMR deficiency conferred by MutLα mutations mlh-1 and pms-2. Given that null alleles of the human and C. elegans leading strand polymerase Polε catalytic subunit, POLE and pole-1, respectively, cause lethality, we focused our analysis on a non-essential C. elegans Polε subunit, termed POLE-4. Dbp3p, the S. cerevisiae POLE-4 ortholog, has been implicated in stabilizing POLE interaction with the primer-template DNA complex (Aksenova et al. 2010).
We detected an average of 4 base substitutions and 2 insertions or deletions in wild-type C. elegans lines propagated for 20 generations (Fig. 1A, 1B). In contrast, mlh-1 and pms-2 mismatch repair single mutants carried an average of 1174 and 1191 unique mutations, respectively, of which 288 and 309 were base substitutions (Fig. 1A) and 886 and 882 indels, defined as small insertions and deletions of less than 400 base pairs (Fig.  1B). The nature of single nucleotide changes and the overall mutation burden were congruent across independent lines of the same genotype and mutation numbers linearly increased from F10 to F20 generation lines (Fig. 1). In contrast to mlh-1 and pms-2, pole-4 mutants exhibited mutation numbers and profiles not significantly different from wild-type (Fig. 1, Supplemental Table S2).

Mutation rates and patterns in pole-4; pms-2 double mutants
To further investigate the role of pole-4 and the genetic interaction with MMR deficiency, we generated pole-4; pms-2 double mutants. pms-2 mutants carried an average of 145 base substitution and 527 indels over 10 generations, roughly half the number we observed in the F20 generation ( Fig. 1C and 1D, Supplemental Table S2). In comparison, the number of single base substitutions and indels was increased ~4.4 fold and ~1.4 fold in pole-4; pms-2 double mutants to an average of 637 and 723, respectively (Supplemental Table S2, Fig. 1C,D). We did not identify any structural variants (SVs) in the genotypes analyzed except for pole-4, where a single SV was observed in three F10 mutation accumulation lines (Supplemental Table S2). We could not readily propagate pole-4; pms-2 beyond the F10 generation, suggesting that a mutation burden higher than ~500-700 single base substitutions ( Fig. 1C) in conjunction with 700-750 indels (Fig. 1D) might be incompatible with organismal reproduction. The increased mutation burden of pole-4; pms-2 double mutants compared to that of pms-2 and to the wild-type mutation rate of pole-4 suggests that replication errors occur at increased frequency in the absence of pole-4 but are effectively repaired by MMR.
The genome-wide mutation rates observed in the absence of C. elegans MutLα proteins MLH-1 and PMS-2 are in line with mutation rates previously determined for C. elegans MutS and S. cerevisiae MMR mutants (Strand et al. 1993;Yang et al. 1999;Degtyareva et al. 2002;Tijsterman et al. 2002;Denver et al. 2005). However, unlike in mammalian cells (Yao et al. 1999;Baross-Francis et al. 2001), C. elegans mlh-1 and pms-2 mutants exhibited almost identical mutation rates and profiles, suggesting that the inactivation of the MutLα heterodimer is sufficient to induce a fully penetrant MMR phenotype consistent with the absence of PMS1 MutLβ and MLH3 MutLγ homologs in C. elegans.
Our finding that pole-4 mutants do not show increased mutation rates is surprising given that the deletion of the budding yeast POLE-4 homolog Dpb3 leads to mutation rates comparable to the proofreading-deficient pol2-4 allele of the Polε catalytic subunit (Aksenova et al. 2010;Lujan et al. 2012). Increased mutation rates have also been reported for proofreading mutants of the Polε catalytic subunit in mice and human and in humans such mutations are associated with an increased predisposition to colorectal cancer (Albertson et al. 2009;Lujan et al. 2012;Palles et al. 2013).

Distribution and sequence context of base substitutions
We next wished to determine the mutational patterns associated with DNA mismatch repair defects alone and combined with pole-4 deficiency. T>C and C>T transitions were present more frequently than T>A, T>G, C>A and C>G transversions in mlh-1 and pms-2 single and pole-4; pms-2 double mutants ( Fig. 1A,C, Supplemental Table S2). A similar preponderance of T>C and C>T transitions was previously observed in S. cerevisiae msh2 mutants and in MMR defective human cancer lines (Alexandrov et al. 2013a;Lujan et al. 2014;Supek and Lehner 2015). Analyzing all base substitutions within their 5' and 3' sequence context, we found no prominent enrichment of distinct 5' and 3' bases associated with T>C transitions in mlh-1 and pms-2 single mutants. In contrast, T>A transversions occurred with increased frequency in an ATT context, C>T transitions in a GCN context and C>A transversions in a NCT context (Fig. 1E, Fig.   4A).
Interestingly, > 90% of T>A transversions in an ATT context occurred in homopolymer runs; the majority (> 75%) in the context of two adjoining A and T homopolymers (Supplemental Fig. S1A). An increased frequency of base substitution at the junction of adjacent repeats has also been reported in S. cerevisiae MMR mutants, giving rise to the speculation that such base substitutions may be generated by double slippage events (Lang et al. 2013). Moreover, we observed several examples in which one or several base substitutions had occurred that converted a repeat sequence such that it became identical to flanking repeats consistent with polymerase slippage across an entire repeat (Supplemental Fig. S1B-D). Such mechanisms could lead to the equalization of microsatellite repeats, a phenomenon referred to as microsatellite purification (Harr et al. 2000).
While we could not define mutational patterns specifically associated with pole-4 loss due to the low number of mutations, the profile of pole-4; pms-2 double mutants differed from MMR single mutants. Most strikingly, in addition to C>T transitions in a GCN context, T>C transitions were generated with higher frequency accounting for >50% of all base changes (Fig. 1C). Among these, T>C substitutions in the context of a flanking 5' cytosine were underrepresented (Fig. 1E). Interestingly, a higher proportion of T>C changes, not embedded in a defined sequence context, has been reported for MMR-deficient tumor samples containing mutations in the lagging strand polymerase Polδ (Shlien et al. 2015), but not in S. cerevisiae and human tumors with a combined MMR and Polε deficiency (Lujan et al. 2014;Shlien et al. 2015). No obvious chromosomal clustering of base substitutions was observed in pms-2 and pole-4; pms-2 grown for 10 generations (Supplemental Fig. S2A).

Sequence context of insertions and deletions associated with MMR deficiency
The majority of mutations observed in mlh-1 and pms-2 single and pole-4; pms-2 double mutants were small insertions/deletions (indels) (  Table S2) and affected dinucleotide repeat sequences (Fig. 1F) and homopolymer runs at similar frequency, as recently also reported for MMR mutants in S. cerevisiae (Lujan et al. 2015).
Trinucleotide repeat instability is associated with a number of neurodegenerative disorders, such as fragile X syndrome, Huntington's disease and Spinocerebellar Ataxias (Brouwer et al. 2009). Based on our analysis, trinucleotide repeat sequences are present in the C. elegans genome at a > 400 fold lower frequency than homopolymer runs (Supplemental Material). We observed between 3 to 7 trinucleotide indels per 10 generations in mlh-1 and pms-2 mutants (Supplemental Table S2). However, these occurred predominantly in homopolymer sequences precluding an estimation of mutation rates for trinucleotide repeats.
Clustering of 1 bp indels was not evident for pms-2 and pole-4; pms-2 F10 lines beyond a somewhat reduced occurrence in the center of C. elegans autosomes which correlates with reduced homopolymer frequency in these regions (Supplemental Fig. S2B).

Dependency of 1 bp indel frequency on homopolymer length
Given the high number of indels present in homopolymer repeats we aimed to investigate the correlation between indel frequency and homopolymer length. Overall, we identified 3,433,785 homopolymers in the C. elegans genome, ranging in length between 4-35 nucleotides ( Fig. 2A, Material and Methods). 47% of all homopolymers each were poly-A or poly-T repeats, their frequencies decreasing with increasing length ( Fig. 2A). C and G each comprised 3% of all homopolymers with frequencies decreasing up to a length of 8 bp, plateauing between 8 and 17 bp, and further decreasing for longer homopolymer tracks ( Fig. 2A). These findings are consistent with a previous report on >7 bp long homopolymers (Denver et al. 2004). In C. elegans MMR mutant backgrounds the frequency of 1 bp indels increased with homopolymer length of up to 9-10 base pairs and trailed off in longer homopolymers (Fig. 2B). Given that the frequency of homopolymer tracts decreases with length ( Fig. 2A) we normalized for homopolymer number. To assess the variability of the frequency estimation, we applied an additive model (Material and Methods), which supported a rapid increase in indel frequency in homopolymers up to a repeat length of 10 bases followed by a drop or plateau in indel frequency for longer homopolymers with decreasing confidence (Fig.   2C). Firm conclusions about indel frequencies in homopolymers >13 bp are precluded by the lack of statistical power due to the low numbers of both long homopolymers in the genome and associated indels observed (Fig. 2B). In summary, our data suggest that replicative polymerase slippage occurs more frequently with increasing homopolymer length, with a peak for homopolymers of 10-11 nucleotides, followed by reduced slippage frequency in longer homopolymers. These results are consistent with observations in budding yeast (Lang et al. 2013) and a recent study using human MLH-1 KO organoids (Drost et al. 2017).

Comparison of C. elegans MMR patterns to MMR signatures derived from human colorectal and gastric adenocarcinoma samples
To assess how our findings relate to mutation profiles occurring in human cancer we Having observed high 1bp indel frequencies associated with homopolymer repeats in C. elegans pms-2 and mlh-1 mutants (Fig. 1B,D, Fig. 2B), we also considered indels in our analysis of human mutational signatures. Comparing these to existing COSMIC signatures by calculating the similarity score between their base substitution profiles showed that many had a counterpart in the COSMIC database with high similarity (Table 1 Table S3). Interestingly, we observed an increased number of mutations assigned to the Clock-1 signature in human MMR-deficient samples (Fig. 3A,E). The Clock-1/COSMIC signature 1 is thought to reflect spontaneous 5meC deamination and its conversion to thymine, a mutational process that is thought to be active in all tissues and which correlates with the age at the time of cancer diagnosis (Alexandrov et al. 2013a). Our data suggest a role of MMR in the repair of 5meC deamination-induced mismatches (Bellacosa 2001;Tricarico et al. 2015;Grin and Ishchenko 2016). Notably the most frequent MMR signature, Signature 6, shows high rates of C>T mutations in an NCG context possibly reflecting imperfect delineation of the underlying mutational processes.
A second sample cluster is represented by six tumors (Fig. 3A, bottom right) with most of their mutations falling within the POLE signature (brown). Consistently, these samples also carry pathogenic POLE mutations (Supplemental Table S4). Another cluster is formed by a subset of stomach cancer samples carrying a 17-like signature (Fig. 3A, bottom left). Four tumor samples outside of these clusters and dispersed over the similarity map are MSI. These tumors may have acquired MMR deficiency very late in their development.
To compare human and C. elegans MMR footprints we first determined mutational patterns from mlh-1 and pms-2 single mutants as well as from the pole-4; pms-2 double mutant (Material and Methods). Mutational patterns of mlh-1 and pms-2 mutants were nearly identical with a cosine similarity of 0.97 (Fig. 4A top panels). In contrast the pole-4; pms-2 mutational pattern showed a different relative contribution of C>T and T>C mutations (Fig. 4A top panels) and displayed a cosine similarity to mlh-1 and pms-2 below 0.71 (Supplemental Fig. S6C). We next adjusted for the difference in trinucleotide frequencies in the C. elegans genome and the human exome ( Fig. 4A bottom panels, Fig. 4B). Comparison of C. elegans MMR patterns with known cancer signatures showed the highest similarity of 0.77 to COSMIC signature 20 (Table 1, Supplemental Fig. S7). Of the three human MMR-associated de novo signatures, only MMR-1 displayed similarity to C. elegans MMR substitution patterns with a cosine similarity of 0.84 to pms-2 and of 0.81 to mlh-1 (Table 1, Fig. 4C). A notable difference in the C. elegans pms-2 and mlh-1 patterns compared to the MMR-1 signature are a reduced level of C>T mutations in NCG contexts (Fig. 4C, stars) and a high frequency of T>A mutation in an ATT context. The first is likely due to the lack of spontaneous deamination of 5methyl-C, a base modification that is absent in C. elegans (Greer et al. 2015), the latter likely due to a higher relative frequency of poly-A and poly-T homopolymers in the C. elegans genome versus the human exome ( Fig

DISCUSSION
MMR-deficient tumors have among the highest mutation rates across cancer types. In line with this observation, we observed an ~70 fold increase in the number of base substitutions in C. elegans mlh-1 and pms-2 mutants. This mutation rate is only surpassed by that of the pole-4; pms-2 double mutant in which mutation rates are further increased 2-3 fold. Genome maintenance is highly efficient as evidenced by a wild-type C. elegans mutation rate in the order of 8 x 10 -10 per base and cell division. It thus appears that DNA repair pathways act highly redundantly, and that it may require the combined deficiency of multiple DNA repair pathways to trigger excessive mutagenesis.
Equally a latent defect in DNA replication integrity might only become apparent in conjunction with a DNA repair deficiency. Indeed the increased mutation burden detected in the pole-4; pms-2 double mutant while no increased mutation rate is observed in pole-4 alone uncovers a latent role of pole-4. It appears that replication errors occur at increased frequency in the absence of C. elegans pole-4 but are effectively repaired by MMR.
Out of the signatures associated with MMR deficiency in cancer cells, only MMR-1 is related to the mutational pattern found in C. elegans mlh-1 and pms-2 mutants.
Considering the controlled nature of the C. elegans experiment we postulate that MMR-1 reflects a conserved mutational process of DNA replication repaired by MMR.
Consistent with this we find that MMR-1 activity is closely linked to MSI status, an established indicator for mismatch repair deficiency. In cases of hypermutation we suggest that akin to the pole-4; pms-2 double mutant, mutational footprints can be attributed to the failed repair of lesions originating from mutations in DNA repair or DNA replication genes. For instance in MMR defective lines also carrying POLE catalytic subunit mutations the mutational landscape is overwhelmed by the POLE signature (Shlien et al. 2015). Likewise it appears possible that the MMR-2 and MMR-3 signatures could be attributed to other mutational processes, which are repaired by MMR and lead to hypermutation under MMR deficiency. Overall, MMR-1 seemingly reflects a 'basal' mutational process in both humans and C. elegans. In addition, human MMR deficiency also includes an element of failing to repair lesions arising from CpG deamination and leading to C>T mutations, a process absent in C. elegans due to the lack of cytosine methylation. The associated human signature, "Clock-1", together with MMR-1 explains the majority of mutations occurring in MMR defective cancers not apparently affected by hypermutation.
Matching mutational signatures to DNA repair deficiency has a tremendous potential to stratify cancer therapy tailored to DNA repair deficiency. This approach appears advantageous over genotyping marker genes, as mutational signatures provide a read-out for cellular repair deficiency associated with either genetic or epigenetic defects.
Following on from our study we expect that analyzing DNA repair defective model organisms and human cell lines, alone or in conjunction with defined genotoxic agents, will contribute to a more precise definition of mutational signatures occurring in cancer genomes and to establishing the etiology of these signatures.

DNA sequencing, variant calling and post-processing.
Illumina sequencing, variant calling and post-processing filters were performed as described (Meier et al. 2014) with the following adjustments.

Estimating mutation rates.
Mutation rates were calculated using maximum likelihood methods, assuming 15 cell divisions per generation (Meier et al. 2014), and considering that mutations have a 25% chance to be lost, a 50% chance to be transmitted as heterozygous, and a 25% chance to become homozygous, thus becoming fixed in the line during each round of C. elegans self-fertilization. Wild-type, pole-4 and pms-2 mutation rates were calculated from mutations observed across F10 and F20 generations.

Analysis of homopolymer sequences in C. elegans and human cancer samples
Homopolymers, di-and tri-nucleotide runs encoded in the C. elegans genome, defined here as repetitive DNA regions with a consecutive number of identical bases or repeated sequence of n≥4, were identified from the reference genome WBcel235.74 using an in house script (Supplemental Data Analysis, https://gerstung-lab.github.com/MMR) based on R packages Biostrings and GenomicRanges (Lawrence et al. 2013;Pagès et al. 2016) Overall Overall, we identified 976,390 homopolymers in the human exome, which ranged from 4 to 35 basepair in length (Supplemental Fig. S4A). A more recent genome build, GRCh38, does not differ in the composition of coding regions. Therefore the analysis of homopolymer frequencies is valid using both assembly versions.

De novo signature extraction from human cancer samples.
Variant calls for whole-exome sequencing data from the colorectal adenocarcinoma Similarity between signatures was calculated via cosine similarity: where < 1 , 2 > is a scalar product of signature vectors. When compared to 96-long substitution signatures, indels were omitted from 104-long de novo signatures.

Stochastic nearest neighbor representation (t-SNE) (van der Maaten and Hinton 2008)
was obtained using R-package "tsne" (Donaldson 2016) using the cosine similarity as distance measure between mutational profiles. In order to confirm the link between signatures MMR-1-3 and MMR deficiency, we defined MMR-deficient samples as those annotated as MSI-H (microsatellite instable high) in TCGA Clinical Explorer (Lee et al. 2015). Relative contributions of every signature to the samples from the combined dataset were tested for association with MSI/MSS status using one-tailed Wilcoxon rank sum test. All p-values were adjusted for multiple testing correction using Bonferroni procedure.

Comparison of C. elegans mismatch repair mutation patterns to cancer signatures.
To extract the signatures of individual factors from respective C. elegans samples, we used additive Poisson model with multiple factors for every trinucleotide context and indel type and calculated maximum likelihood estimates for every signature (Supplemental Material). For comparison of C. elegans and human mutational signatures, signatures acquired in C. elegans were adjusted by multiplying the probability for 96 base substitutions by the ratio of respective trinucleotide counts observed in the human exome (GRCh37, counts pre-calculated in (Rosenthal et al. 2016) to those in the C. elegans reference genome (WBcel235). Indels were not included in the comparative analysis as they required adjustment for both base and homopolymer content. COSMIC signatures were also adjusted to exome nucleotide counts as they were mostly derived from whole exomes (Alexandrov et al. 2013a;Alexandrov et al. 2013b) and the comparison of de novo signatures to COSMIC is more valid in exome space. All signatures were further normalized so that the vector of probabilities sums up to 1 (Supplemental Table S5). For mutational signature comparison a cosine similarity of 0.80 was considered a threshold for "high" similarity (Supplemental Material, Figure   S6A,B).