A comparative analysis of whole genome sequencing of esophageal adenocarcinoma pre- and post-chemotherapy

The scientific community has avoided using tissue samples from patients that have been exposed to systemic chemotherapy to infer the genomic landscape of a given cancer. Esophageal adenocarcinoma is a heterogeneous, chemoresistant tumor for which the availability and size of pretreatment endoscopic samples are limiting. This study compares whole-genome sequencing data obtained from chemo-naive and chemo-treated samples. The quality of whole-genomic sequencing data is comparable across all samples regardless of chemotherapy status. Inclusion of samples collected post-chemotherapy increased the proportion of late-stage tumors. When comparing matched pre- and post-chemotherapy samples from 10 cases, the mutational signatures, copy number, and SNV mutational profiles reflect the expected heterogeneity in this disease. Analysis of SNVs in relation to allele-specific copy-number changes pinpoints the common ancestor to a point prior to chemotherapy. For cases in which pre- and post-chemotherapy samples do show substantial differences, the timing of the divergence is near-synchronous with endoreduplication. Comparison across a large prospective cohort (62 treatment-naive, 58 chemotherapy-treated samples) reveals no significant differences in the overall mutation rate, mutation signatures, specific recurrent point mutations, or copy-number events in respect to chemotherapy status. In conclusion, whole-genome sequencing of samples obtained following neoadjuvant chemotherapy is representative of the genomic landscape of esophageal adenocarcinoma. Excluding these samples reduces the material available for cataloging and introduces a bias toward the earlier stages of cancer.

The scientific community has avoided using tissue samples from patients that have been exposed to systemic chemotherapy to infer the genomic landscape of a given cancer. Esophageal adenocarcinoma is a heterogeneous, chemoresistant tumor for which the availability and size of pretreatment endoscopic samples are limiting. This study compares whole-genome sequencing data obtained from chemo-naive and chemo-treated samples. The quality of whole-genomic sequencing data is comparable across all samples regardless of chemotherapy status. Inclusion of samples collected post-chemotherapy increased the proportion of late-stage tumors. When comparing matched pre-and post-chemotherapy samples from 10 cases, the mutational signatures, copy number, and SNV mutational profiles reflect the expected heterogeneity in this disease. Analysis of SNVs in relation to allele-specific copy-number changes pinpoints the common ancestor to a point prior to chemotherapy. For cases in which pre-and post-chemotherapy samples do show substantial differences, the timing of the divergence is near-synchronous with endoreduplication. Comparison across a large prospective cohort (62 treatment-naive, 58 chemotherapy-treated samples) reveals no significant differences in the overall mutation rate, mutation signatures, specific recurrent point mutations, or copy-number events in respect to chemotherapy status. In conclusion, whole-genome sequencing of samples obtained following neoadjuvant chemotherapy is representative of the genomic landscape of esophageal adenocarcinoma. Excluding these samples reduces the material available for cataloging and introduces a bias toward the earlier stages of cancer.
[Supplemental material is available for this article.] The incidence of esophageal adenocarcinoma (EAC) has increased sixfold in the last 30 yr (Lepage et al. 2013). The majority of patients present with advanced disease, and the overall survival is <15% despite advances in multimodal therapy (Jemal et al. 2011;Masclee et al. 2014). Patients who do not have distant nodal or organ metastases are considered suitable for treatment with curative intent. This generally comprises systemic chemotherapy followed by surgical excision. Chemotherapy has been shown to improve survival to >30% for those entering a curative pathway and is now an integral part of standard care either alone or in combination with radiotherapy, although the benefits of radiotherapy are greater in esophageal squamous cell carcinoma (Medical Research Council Oesophageal Cancer Working Group 2002;Cunningham et al. 2006). Complete pathological response after neoadjuvant chemotherapy is rare and constitutes <15% of all cases, highlighting that residual cancer cells often remain after this treatment (Sjoquist et al. 2011;Orditura et al. 2014).
Chemotherapeutic agents exert their effect by directly or indirectly inducing DNA damage and cell death. In EAC, three distinct classes of drugs are mainly used in combination: an intercalating agent, a platinum-derivative, and an anti-metabolite (Allum et al. 2011). Drugs such as epirubicine intercalate directly with the DNA strand and thereby disrupt further replication in rapidly dividing cells. Platinum drugs directly modify DNA through coordinate-covalent bonds between DNA and the platinum moiety, and the gross DNA damage is repaired via the nucleotide excision repair pathway (NER) if intact. 5 ′ -Fluorouracil and derivatives target DNA metabolism and result in DNA adducts, strand breaks, or stalled/collapsed DNA replication forks. In addition, many of these drugs result in an increase of reactive oxygen species (ROS), which can in turn induce DNA damage, including single-strand DNA breaks (Woods and Turchi 2013). Hence, one might expect to see direct effects of chemotherapeutic agents on the DNA sequence, and the extent might depend on the tumor responsiveness to treatment (Rebucci and Michiels 2013).
The mutation burden in EAC is high, with 8.0 mutations/Mb (range 1.53-34.56/Mb) per haploid genome (Alexandrov et al. 2013). The genomic landscape appears to be complex and heterogeneous with a large number of point mutations occurring at very low frequency apart from TP53 mutations, which are present in 70%-80% cases (Dulak et al. 2013;Weaver et al. 2014). Whole-genome sequencing studies and SNP arrays are providing more detail on large-scale chromosomal rearrangements that are common with evidence of catastrophic events such as chromothripsis and breakage-bridge-fusion (BFB) occuring in around one-third of patients (Dulak et al. 2013;Nones et al. 2014).
In the current study, we performed whole-genome sequencing in highly clinically annotated samples of EAC that included chemo-naive and chemo-treated samples as part of the International Cancer Genome Consortium (ICGC). We took the opportunity to critically evaluate the impact of chemotherapy on the genomic landscape. It has recently been reported from exome data that chemotherapy imposes a bottleneck on tumor evolution (Findlay et al. 2016). We therefore first sought to establish the genetic relationship between 10 matched pre-and postchemotherapy samples and the point at which the samples diverged. After perfoming this initial analysis, we examined the single-nucleotide variant (SNV) spectrum, mutational/trinucleotide context, and copy-number aberrations in a larger cohort of 58 chemotherapy-treated and 62 chemotherapy-naive samples.

Results
Whole-genome sequencing of paired samples pre-and postchemotherapy Whole-genome sequencing data were first analysed for 10 cases from which samples were taken pre-and post-chemotherapy. The clinical details of this cohort are shown in Supplemental  Table S1. Of these 10 cases, eight had a single sample taken before and after neoadjuvant chemotherapy, and three had multiple samples taken before and after treatment.
Overall, the matched samples showed the expected range of estimated tumor cellularity, overall ploidy, mutational signature composition, SNV burden, and copy-number variation, including losses of heterozygosity (LOH), as well as focal amplifications and deletions (Supplemental Table S3). Regions of LOH, amplifications, and deletions are mostly the same pre-and post-chemotherapy (with LOH always observed on the same allele for paired samples). Paired samples range from being almost identical (patient 001: 97% of the genome in the same copy-number state, 95% of SNVs called in both samples) to very altered (patient 008: 27% of the genome in the same copy-number state, 23% of SNVs called in both samples).
For each patient, we observe copy-number features present in all tumor cells prechemotherapy that are not present post-chemotherapy, and vice versa. The key question is whether these differences are a consequence of the chemotherapy or simply a reflection of heterogeneity. In seven out of nine cases (the cellularity in one case is too low to call), we identify regions that have lost heterozygosity in the prechemotherapy samples but have retained heterozygosity in the post-chemotherapy samples. This implies that the post-chemotherapy sample cannot have evolved from the prechemotherapy sample but rather they have a shared antecedent.
It is informative to discuss the two extreme cases indicated above in more detail (Figs. 1,2). In patient 008, a minority of the genome has the same copy-number state pre-and post-chemotherapy ( Fig. 1A-C), and in addition, a minority of SNVs are observed both pre-and post-chemotherapy (Fig. 1D). Events that are known to be early, e.g., mutations of TP53 and LOH of key genes (Supplemental Fig. S1), are seen to be shared, and indeed, the majority of the genome that does exhibit LOH is common to both samples (Supplemental Table S3) and always occurs on the same allele when it is common. While different mutations are observed in the pre-and post-chemotherapy samples, the same mutational processes appear to be present (Figs 1E, 3), and the AAB copy-number state is the most common (Supplemental Table S3). Unlike most of the pre-and post-samples in this cohort, the majority of focal amplifications are not shared, but convergent amplification of the FGF region is observed in both samples (Fig. 1H,I). One can therefore infer that clonal divergence occurred shortly after endoreduplication ( Fig. 1F,G,J), and hence, the differences between the two samples are attributable to events that predate chemotherapy.
Patient 001 is a very different case, with virtually no differences pre-and post-chemotherapy ( Fig Table S3). We can see some "clonal" differences between the two samples ( Fig. 2F). "Subclonal" behavior prechemotherapy appears to be a recent change from a clonal state that matched the post-chemotherapy sample, indicating that although the two samples have diverged only recently ( Fig. 2H; Supplemental Table S2), the differences such as they are cannot be attributed to the chemotherapy regime.
In general, when the pre-and post-chemotherapy samples show substantial differences, the timing of the divergence of the samples can be traced to being near-synchronous to endoreduplication. For the samples that show little difference, we can still trace their common ancestry to a point prior to the chemotherapy and see no evidence in the mutational signatures (Fig. 3), copy numbers (Supplemental Table S3), or key genes (Supplemental Fig. S1) to suggest that we are seeing anything other than the heterogeneity.
It is clear that the mutational signatures change over time  and that the more recent mutations are disproportionately affected by factors affecting the power to detect SNVs (including sequencing depth, genomic complexity, and cellularity). Caution must therefore be taken in concluding that chemotherapy has had an effect on the observed mutational signatures, and we do not draw such a conclusion.
Our data suggest that differences seen pre-and post-chemotherapy are reflective of tumor heterogeneity and that either sample could be considered equally representative of the case. However, from these data on a small patient cohort, we cannot rule out the possibility of a subtle selective pressure, and in order to address this, we require larger cohorts of pre-and post-chemotherapy samples, which thus form the second part of this analysis.
Systematic comparison of whole-genome sequencing data on a large cohort of chemotherapy-naive or chemotherapy treated samples Our large cohort (n = 120), shown in Supplemental Fig. S2a, comprised 314 patients from whom there were 138 chemotherapy-naive samples taken at endoscopic diagnosis, prior to any treatment, or at the time of surgical resection if no neoadjuvant systemic chemotherapy was given, and 176 samples taken at surgery following systemic chemotherapy. For the patients receiving chemotherapy, samples were not available both before and after treatment in this cohort. A further breakdown of the samples selected for the large cohort analysis is shown in Supplemental Fig. S2b. The chemotherapy given at all centers was in line with the United Kingdom recommendation, comprising a platinum compound as a backbone generally combined with epirubicin and a 5-fluoruracil derivative. Patients receiving radiotherapy were excluded in order to maintain consistency across the cohorts. The details of the study cohort are shown in Table 1.
Patients for which chemotherapy-treated samples were sequenced were significantly younger (P = 0.002) and, as expected, presented at a more advanced stage of disease (P = 0.024) since, for those patients going down curative pathways, chemotherapy is not required for early stage tumors and patients have to be fit enough to endure toxic therapy. Thus, 25 patients (40%) with chemotherapy-naive samples went straight for surgery without neoadjuvant systemic treatment and were of an earlier stage. Histological response to neoadjuvant chemotherapy as assessed by the Mandard regression score was documented in 78 of the 95 patients (Supplemental Table S7). Of these, 16 (21%) had Mandard scores of one to three, indicating some degree of histological response. A score of four to five (present in the remaining 79%) indicates poor response to neoadjuvant treatment, as expected for this particular cancer. Although the chemotherapy-treated group showed higher recurrence rates, this was not statistically significant (P = 0.139). At the time of analysis, 50% of the patients were alive, with no significant difference between the treated and the chemotherapy-naive group (P = 0.361). Please note that these statistics reflect the earlier stage cases in the chemotherapy-naive group and are thus not reflective of the known benefit of chemotherapy shown in randomized trials for this disease.
All cases underwent a stringent pathological review of a frozen H&E section from the same sample that would be submitted for sequencing to confirm the diagnosis and ensure that the histopathological estimate of tumor cellularity exceeded 70%. Of the total cohort of n = 314 samples (n = 176 chemotherapy-treated, n = 138 treatment-naive), significantly more samples that were exposed to chemotherapy failed this pathology review and were therefore excluded (n = 98, 55.7% vs. n = 35, 25.4%; P < 0.001) (Supplemental Fig. S2b). Of the treatment-naive samples, a higher Of the shared SNVs, approximately 6000 lie within copy-number states AA, AAB, AABB, AAA, or AAAA and can confidently be categorized as early or late (relative to their copy-number changes). The contributions for these subsets are shown also. (F ) For regions that in the prechemotherapy sample have copynumber status AAA, we see that no SNVs unique to this sample have three copies. (G) For regions that in the post-chemotherapy sample have copy-number status AAA, we see that no SNVs unique to this sample that have three copies. (H) Illustrated are allele-specific copy numbers for a region of Chromosome 11 in the prechemotherapy sample. (I ) Illustrated are allele-specific copy numbers for a region of Chromosome 11 in the post-chemotherapy sample. (J) A sketched likely timeline for this sample, although inherent to this type of data, the timings of losses are supposition.
proportion of endoscopic biopsies failed the pathology review compared with surgical specimens as would be expected from their small size (n = 33, 30.6% vs. n = 2, 6.7%; P = 0.01).

Genomic metrics of chemo-naive and chemo-treated samples of the large cohort
The group of treatment-naive samples contained a median of 24,449 SNVs and indels (combined), with a median absolute deviation (MAD) of 16,355, while the chemo-treated group had a median of 20,071 SNVs and indels (MAD = 12,223). The mutation rates in the chemo-naive and treated groups were similar, with a mean of 8.7 mutations/Mb for the former and 7.5 mutations/Mb for the latter (Wilcoxon rank-sum test P-value = 0.4).
Some genes were only recurrently mutated in the chemotherapy treated samples, e.g., PTGES3L-AARSD1, RN7SL332P, AC011893.3, OR4D12P, TSPAN10, PPFIA3, C15orf39, SLC27A4, and NAA30, in at least 15% of this group. However, the top recur-rently mutated genes that have been previously characterized for EAC, which are more likely drivers in this cancer, were generally mutated in a similar proportion of cases across the two cohorts ( Fig. 4; Dulak et al. 2013;Weaver et al. 2014).
The tissue samples in the two groups displayed similar proportions of amplifications, deletions, and LOH regions (Wilcoxon rank-sum test P-values >0.05) ( Fig. 5; Supplemental Tables S4, S8; Supplemental Fig. S3). Furthermore, in each group, the defined genomic characteristics were similar regardless of patient age, disease stage, resection margin status (positive or negative for tumor cells), or sample source (biopsy or resection specimen), (Supplemental Table S5).
The effect of chemotherapy on mutational spectrum analysis in the large cohort EAC mutational signatures were extracted using the method presented by Alexandrov et al. (2013). A total of six mutational Of the shared SNVs, approximately 14,000 lie within copy-number states AA, AAB, AABB, AAAA, or AAAAB and can confidently be categorized as early or late (relative to their copy-number changes). The contributions for these subsets are shown also. (F ) Illustrating SNVs for a region that exhibits different copy-number states pre-(AABB) and post-(AAB) chemotherapy. The centers of predicted clusters for these states are indicated. Within each sample, the copy-number state appears to be consistent in 100% of tumor cells. (G) Illustrated are SNVs for a region that demonstrates subclonal copy-number behavior prechemotherapy. The two sets of expected cluster centers for clonal AABB (red) and AAABB (blue) solutions prechemotherapy, against AAABB post-chemotherapy, are illustrated. The lack of a fourth SNV cluster is a strong indicator that this is a subclonal loss of one copy from a previously clonal AAABB state. (H) A sketched likely timeline for this sample, although the timing of losses is supposition.
signatures were identified, of which five have been previously identified in EAC and other cancer types (Dulak et al. 2013;Weaver et al. 2014). None of these five signatures have been previously associated with exposure to chemotherapy. We therefore compared the number of mutations generated by each signature within the two cohorts and did not observe any clear difference (Fig. 6). A comparison of the mutational signatures for the 10 pairs of samples pre-and post-chemotherapy again showed similar nonsignificant differences (Supplemental Fig. S6).
Since there was no significant overrepresentation of a particular trinucleotide in either group (Wilcoxon rank-sum test adjusted P-values >0.05) (Supplemental Fig. S4), this prompted us to repeat the analysis with a focus on C>A substitution mutations occurring at CpC dinucleotides that have been previously reported to be associated with systemic treatment with cisplatin (Meier et al. 2014). There was a significant enrichment for the cisplatin-induced mutational signature in the chemotherapy-treated cohort (Fisher's exact test P-value <0.0001) (Supplemental Fig. S5), in agreement with observations by Murugaesu et al. (2015).

Discussion
In this study, we have used whole-genome sequencing, incorporating a comprehensive analysis of copy number, SNVs, and mutational signatures from prospectively collected samples with stringent pathology QC but without imposing any restriction on including samples collected from patients who had already been treated with chemotherapy.
The first aim was to ensure that inclusion of chemotherapyexposed tissues did not result in poor quality samples in terms of low cellularity, DNA quality, or sequencing metrics, and we have demonstrated that the quality metrics were generally favorable and resulted in the inclusion of a greater proportion of late-stage tumors that would otherwise have been excluded. We then examined a small cohort of patients with samples collected pre-and post-chemotherapy (n = 10) and observed a range in the degree of genomic concordance given the extent of heterogeneity expected in this disease (Dulak et al. 2013;Nones et al. 2014;Weaver et al. 2014). For the samples within a case that showed a high degree of similarity (e.g., IDs 002 and 001), we can trace their common ancestry to a point prior to the chemotherapy. On the other hand, when the pre-and post-chemotherapy samples show substantial differences (e.g., IDs 007, 008, and 005), the timing of the divergence of the samples can be traced to being near-synchronous with endoreduplication. When investigating the effect of chemotherapy on a larger scale in a cohort of 120 patients, we observe that the genome of EAC is remarkably resistant to the effect of neoadjuvant chemotherapy. Indeed, there was a striking similarity noted between chemotherapy-naive samples and those treated with neoadjuvant chemotherapy at the level of copy-number aberrations, SNVs, and mutational spectra. This study was not designed to examine the genetic predictors of response to chemotherapy, which requires a different experimental approach given that it is generally a chemoresistant disease. Only 20% of patients in our study showed a histopathological response (Mandard score 1-3, based on the degree of fibrosis and proportion of tumor cells remaining) to chemotherapy, which is consistent with the treatment response expected from the literature (Cunningham et al. 2006;Alderson et al. 2015).
To date, most large-scale sequencing efforts, including The Cancer Genome Atlas (TCGA) and other ICGC projects, have been confined to patients who are naive to systemic treatment. Hence, for cancers treated with chemotherapy prior to surgical resection (e.g., cancers of the stomach, esophagus, breast, bladder, cervix, and lung), this has restricted the samples available for analysis to pretreatment diagnostic biopsies that are generally obtained via endoscopy or laparoscopy and are challenging to work with due to their small size. The main reason for exclusion of samples in our cohort was low cellularity (<70%) as determined by expert pathology review (three independent pathologists) of a frozen section taken from the samples used for DNA extraction. The proportion of chemotherapy-treated samples excluded at this stage was more than twice as high as the proportion of treatment-naive samples, and so this will potentially bias selection away from those who show a good histopathological response to systemic neoadjuvant treatment. However, apart from cellularity there was no further difference in the quality or quantity of DNA, library or sequence obtained. In the future as technology improves, sequencing of single cells  in cases that are highly responsive to neoadjuvant therapy may be informative. Our observation that the majority of EAC genomes remained rather stable following chemotherapy is consistent with breast cancer studies when considering those patients with chemoresistant disease. For example, in a candidate gene study of 47 breast cancer patients, Almendro et al. (2014) found that intratumor genetic diversity was indicative of the tumor subtype and remained stable in patients with only partial or no response to treatment. Yates et al. (2015) interrogated the subclonal architecture of breast cancer in 50 patients, of which 18 had samples taken before and after neoadjuvant chemotherapy. In five of these patients, new clones were seen in the post-chemotherapy samples with potential driver events such as amplifications in MYC and FGFR2 and deletions in RUNX1. Detailed phylogenetic reconstruction of these five cases suggested that the treatment-resistant clones they observed were likely to have been missed at the time of prechemotherapy sampling, and were unlikely to be the result of new subclones arising during treatment.
In the context of EAC, Murugaesu et al. (2015) performed exome sequencing on samples from eight cases taken before and after chemotherapy. The extensive multiregion sampling was a strength of this small study, and they found a positive correlation between the degree of intra-tumoral heterogeneity and a poor response to neoadjuvant chemotherapy, which in turn correlated with a worse survival. Our study was performed as part of the ICGC, which is designed to examine the landscape by virtue of examining a large number of tumor:normal pairs, and hence, we were generally unable to perform multiregional sampling. Findlay et al. (2016) recently reported results from their exome anlaysis of 30 preand post-chemotherapy EAC samples, and in this study, they purposefully selected cases showing a range of responses to chemotherapy. They associated good clinical response, as determined by the histopathological Mandard score generated from the post-chemotherapy surgical resection specimen, with evidence for genomic bottlenecking as a result of chemotherapy. This is at odds with our interpretation.
We cannot, from such a limited number of cases with pre-and post-chemotherapy samples, in such a diverse disease, separate the potential sources of heterogeneity arising from spatial sampling, temporal sampling, and chemotherapy unless we can make some inference about the timing of events. It has been reported previously that some point mutations, LOHs, and genome duplication events occur early in the cancer progression and that genomic catastrophes and the accumulation of clonal diversity may play a role. Our paired cases support these prior observations (Nones et al. 2014).
On average, we noted that approximately a quarter of the genome had undergone LOH both pre-and post-chemotherapy in our samples, and in all cases, with paired samples the same allele was lost pre-and post-chemotherapy. Therefore, we infer that LOH and then genome doubling occur early in the life history of the cancers. The high point mutation rate associated with EAC allows us to say something about the timing of genomic catastrophes and the establishment of clonal diversity. If large-scale genomic rearrangements predate clonal diversity, then we expect to see SNVs that occur after the copy-number changes but that are shared pre-and-post chemotherapy. If the clonal diversity occurs before the copy-number changes, then we would expect to see SNVs that are unique to one sample but that predate local copy-number changes. We see neither of these, strongly suggesting that the establishment of clonal diversity and the copy-number changes are roughly concurrent. This suggests that it is not just localized catastrophes but genome-wide changes that seem to occur near-simultaneously. Therefore, the divergence of the clones observed pre-and post-chemotherapy must have occurred substantially before treatment was administered, and thus, chemotherapy cannot be responsible for the divergence. An alternative explantion would be selective pressures for one clone out of those available, but the larger cohorts discussed above revealed little evidence of systematic selection of this kind.
Regarding the mutational signature analysis, we used the methods of Alexandrov et al. (2013), which identified six main SNV signatures in our data, five of which have been previously described in EAC data sets (Dulak et al. 2013;Nones et al. 2014). When comparing samples taken pre-and post-chemotherapy, we observe that the signature patterns are often different between those occurring before the copy number changes and those timed as occurring after, but the more recent signature is consistent between both the pre-and post-chemotherapy samples. Thus, any apparent differences in the mutational signatures pre-and post-chemotherapy are likely attributable to cellularity-induced differences in the power to detect the recent SNVs that, by definition, have lower allele fractions.
While our study was not designed to determine the prognostic value of genomic response to chemotherapy, we acknowledge that some of the samples for which pre-and post-chemotherapy profiles differ the most (e.g., 007 and 008) are some of those with the best survival. However, we also note they are two of the cases with the best pathological TNM staging. Any approach to prognosticate based on genomic factors (e.g., perhaps following the results of Findlay et al. 2016) should at most temper established prognostic factors such as these fundamental phenotypic characteristics. Moreover, as discussed, some mutations were found to be more recurrent following chemotherapy, and this is an area ripe for further research as the appropriate cohorts become available.
In conclusion, the overall genomic profile of EAC remains similar before and after chemotherapy. The poor survival in EAC would support our findings that this cancer is resistant to chemotherapy with remarkable consistency in the genome of the primary tumor over time. Based on our findings, we would suggest that inclusion of neoadjuvant treated samples for large-scale sequencing efforts should be considered by the sequencing community. Such an approach will avoid biasing cohorts toward the earlier stages of the disease and increase the number of samples available for analysis particularly in tumor types with neoadjuvant therapy regimens. With the increasing recognition of the extent of epithelial tumor heterogeneity, large-scale efforts are essential to maximize the power of uncovering the full spectrum of mechanisms driving tumorigenesis.

Sample collection and processing
EAC patients were recruited prospectively from 11 sites across the UK as part of the OCCAMS (Oesophageal Clinical and Molecular Stratification) Consortium. Patients on a palliative treatment pathway, as well as those treated with radiotherapy, were excluded. The study was approved by the institutional ethics committees (REC Ns 07/H0305/52, 10/H0305/1), and all patients gave written informed consent.
Samples were obtained during either the diagnostic esophagogastroduodenoscopy or endoscpic ultrasound procedure used for staging and/or from the surgical resection specimen (Fig. 1). For each patient, blood or normal squamous esophageal samples, at least 5 cm distant from the tumor, were used as a germline reference. In 10 cases, tumor samples were taken from multiple spatially distinct sites at surgery and, in two cases, also at EGD.
All tissue samples were snap-frozen in liquid nitrogen immediately after collection and stored at −80°C. H&E-stained sections from cancer samples were reviewed independently by two expert hisptopathologists, and DNA was extracted and sequenced if tumor cellularity was ≥70%. DNA was extracted from frozen Figure 6. Mutational spectra analysis in chemotherapy-treated and chemotherapy-naive patient groups. The most relevant signatures for each group have been identified according to the method published by Alexandrov et al. (2013). Each bar shows the proportion of calls for the relevant signature per sample; the bars on the far right, the cumulative proportion for each group. esophageal tissue using the AllPrep DNA/RNA mini kit (Qiagen) and from blood samples using the QIAamp DNA blood maxi kit (Qiagen) according to manufacturer's instructions.

Whole-genome sequencing
As part of the ICGC, 100-to 125-bp paired-end sequencing was performed under contract by Illumina to a typical depth of at least 50×, with 94% of the known genome being sequenced to at least 8× coverage while achieving a PHRED quality of at least 30 for at least 80% of mapping bases. QC metrics were computed on a per lane basis using FastQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc) and in-house tools, enabling the identification of sequence reads that required trimming. Technical details of the sequencing metrics are given in Supplemental Table S2.

Mutation calling
Sequence reads were aligned to the human reference genome (GRCh37 from Ensembl release 71) (Yates et al. 2016) using BWA 0.5.9 (Li and Durbin 2009). Aligned reads were then sorted into genome coordinate order and duplicate reads marked using Picard 1.115 (FixMateInformation and MarkDuplicates tools, respectively; http://broadinstitute.github.io/picard). Somatic SNVs and indels were detected using Strelka 1.0.13 (Saunders et al. 2012). To increase accuracy, additional filters were applied to high-confidence calls (those passing Stelka's filters); details are given in Supplemental

Copy-number calling
For the large cohort, absolute copy-number alterations, cellularities, and ploidies for each sample were estimated using ASCAT-NGS v.2.1 using read counts at germline heterozygous positions estimated by GATK 3.2-2 (Van Loo et al. 2010;Nik-Zainal et al. 2012). Segments were considered amplified if the ratio of absolute copy number to ploidy exceeded two and deleted if the ratio was less than 0.5. LOH regions were defined as regions in the genome where the minor copy number was zero.

Mutational signature analysis
Mutational signatures were identified using the methodology described by Alexandrov et al. (2013). Before running the software, common variants in the 1000 Genomes database (The 1000 Genomes Project Consortium 2015) appearing in at least 0.5% of the population were removed. The optimal number of signatures in the data set was chosen to balance the signature stability against the Frobenius reconstruction error. The cisplatin signature enrichment analysis was performed as described by Murugaesu et al. (2015).

Multiple sample analysis
A GATK walker was used to identify a set of germline-heterozygous loci for each trio. The search was restricted to the autosomes, sites with no more than 20 germline reads were filtered by GATK (McKenna et al. 2010), sites with germline coverage between 16 and 90 with at least four copies of each allele present, sites where the strand bias lies between 0.1 and 0.9, and sites that are not in obvious regions of germline copy-number variation, identified with fastseg (Klambauer et al. 2012). This results in approximately 2 million such loci per trio. The depths of coverage and allele fractions for these loci were recorded for all samples in the trio.
To aid segmentation, a running median was applied to the depth and allele fraction data. A single segmentation of these values was created for each patient by combining, for each tumor sample, a sliding analysis-of-variance procedure and careful manual review of the genome. We erred on the side of oversegmentation as there is little to no penalty for this in the analyses that follow. The cellularity and baseline copy number for each sample was identified using the Crambled tool (Lynch 2015), and depth and allele-fraction values for clonal copy-number states were predicted. Segments were assigned to these copy-number states, or subclonal combinations of those states, based on the mean values for the segments. Where solutions for a segment appeared to be subclonal or differed between the multiple samples for a patient, they were reviewed for possible technical explanations such as missegmentation. Neighboring "segments" assigned the same copy-number state in both samples were merged. Segments were compared across samples to confirm the consistency of allele assignment (e.g., if both samples show two copies of one allele and one copy of the other, is the same allele duplicated in both cases) and corrected if not.
SNVs were called with Strelka and annotated with VEP as described elsewhere. SNVs were mapped to a copy-number state preand post-chemotherapy. SNVs with the same copy-number combination pre-and post-chemotherapy were partitioned into early (coming before a copy-number change) and late mutations where copy-number states and power allowed. Vectors of trinucleotide mutation counts were deconstructed into the 30 COSMIC signatures (http://cancer.sanger.ac.uk/cosmic/signatures) using a quadratic programming approach (Lynch et al. 2016).