Genome-scale analysis of DNA methylation in lung adenocarcinoma and integration with mRNA expression
- Suhaida A. Selamat1,
- Brian S. Chung1,
- Luc Girard2,
- Wei Zhang2,
- Ying Zhang3,
- Mihaela Campan1,
- Kimberly D. Siegmund3,
- Michael N. Koss4,
- Jeffrey A. Hagen5,
- Wan L. Lam6,
- Stephen Lam6,
- Adi F. Gazdar2 and
- Ite A. Laird-Offringa1,7
- 1Department of Surgery, Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA;
- 2The Hamon Center for Therapeutic Oncology Research and Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA;
- 3Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA;
- 4Department of Pathology, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA;
- 5Department of Surgery, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA;
- 6BC Cancer Research Center, BC Cancer Agency, Vancouver, BC V521L3, Canada
Abstract
Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma is its most common histological subtype. Clinical and molecular evidence indicates that lung adenocarcinoma is a heterogeneous disease, which has important implications for treatment. Here we performed genome-scale DNA methylation profiling using the Illumina Infinium HumanMethylation27 platform on 59 matched lung adenocarcinoma/non-tumor lung pairs, with genome-scale verification on an independent set of tissues. We identified 766 genes showing altered DNA methylation between tumors and non-tumor lung. By integrating DNA methylation and mRNA expression data, we identified 164 hypermethylated genes showing concurrent down-regulation, and 57 hypomethylated genes showing increased expression. Integrated pathways analysis indicates that these genes are involved in cell differentiation, epithelial to mesenchymal transition, RAS and WNT signaling pathways, and cell cycle regulation, among others. Comparison of DNA methylation profiles between lung adenocarcinomas of current and never-smokers showed modest differences, identifying only LGALS4 as significantly hypermethylated and down-regulated in smokers. LGALS4, encoding a galactoside-binding protein involved in cell–cell and cell–matrix interactions, was recently shown to be a tumor suppressor in colorectal cancer. Unsupervised analysis of the DNA methylation data identified two tumor subgroups, one of which showed increased DNA methylation and was significantly associated with KRAS mutation and to a lesser extent, with smoking. Our analysis lays the groundwork for further molecular studies of lung adenocarcinoma by identifying novel epigenetically deregulated genes potentially involved in lung adenocarcinoma development/progression, and by describing an epigenetic subgroup of lung adenocarcinoma associated with characteristic molecular alterations.
Lung cancer is the leading cause of cancer-related death worldwide (Jemal et al. 2011). In many countries, adenocarcinoma has surpassed squamous carcinoma as the most common histological subtype of lung cancer, and it is also the most common histological subtype in women, Asians, and never-smokers (Toh et al. 2006). Lung adenocarcinoma is increasingly recognized as a clinically and molecularly heterogeneous disease. This is exemplified by recent reclassifications based on pathology and patient survival (Travis et al. 2011), the increasing number of clinical trials demonstrating targeted treatments that specifically benefit patients defined by molecular subtypes such as EGFR, KRAS, BRAF, and ERBB2 mutations and EML4-ALK fusions (Pao et al. 2004, 2005a,b; Pao and Girard 2011), as well as observed prognostic gene expression signature profiles (Bhattacharjee et al. 2001; Beer et al. 2002; Larsen et al. 2007). DNA methylation-based profiling has also confirmed the existence of epigenetic subtypes in several cancers (Issa 2004; Li et al. 2010; Noushmehr et al. 2010; Hinoue et al. 2012). Promoter DNA methylation, which is associated with gene silencing, can regulate gene expression in a myriad of biological and pathological processes, including lung cancer (Jones 2002; Belinsky 2004; Kerr et al. 2007; Brock et al. 2008; Risch and Plass 2008). Unlike genetic mutations, DNA methylation is an inherently reversible change, and therefore is of great interest as an active target of drug development (Esteller 2003; Rodriguez-Paredes and Esteller 2011).
While previous studies have profiled DNA methylation in lung adenocarcinoma (Shiraishi et al. 2002; Divine et al. 2005; Tsou et al. 2005, 2007; Toyooka et al. 2006; Tessema and Belinsky 2008; Christensen et al. 2009; Goto et al. 2009; Kubo et al. 2009), they have either been limited in the number of samples or genes assayed, focused on a mix of lung cancer histologies, thereby limiting the ability to identify subtypes, or lacked expression studies that allow the potential function of DNA methylation alterations to be determined. To address these issues, here we analyzed 59 lung adenocarcinoma tumors and matched adjacent non-tumor lung (NTL) tissues. Because adenocarcinoma is the most common lung cancer subtype found in never-smokers, it was important to ensure that cancers from smokers and never-smokers would both be included. Thus, we chose the cases so that approximately half of the tumors were from patients who were never-smokers. Using the Illumina Infinium HumanMethylation27 platform, we interrogated the DNA methylation status of 27,578 CpG dinucleotides spanning 14,475 genes. Focusing on genes differentially methylated in tumor vs. non-tumor lung, we integrated mRNA expression data to identify DNA methylation events with potential functional significance. We verified our findings using an independent set of 28 lung adenocarcinomas and matched adjacent NTL, as well as validated select DNA methylation data using an alternative assay, MethyLight. Lastly, we used both supervised and unsupervised analyses of the DNA methylation data to identify subgroups within the tumors.
Results
Genome-scale DNA methylation profiles were obtained for 59 lung adenocarcinomas and matched adjacent NTL tissue (Table 1). Thirty tumors were from never-smokers (defined here as less than 100 cigarettes in a lifetime), while 29 were from current smokers. Before any statistical tests were conducted, we inspected the data for the presence of substantial confounding batch effects due to the separate plates or chips (Leek et al. 2010). We did not observe any such effects (see Supplemental Fig. 1; Methods). One NTL sample was eliminated for quality-control reasons (see Methods); 117 samples were thus further analyzed (as outlined in Supplemental Fig. 2).
Characteristics of subjects and tumors
Identification of differentially methylated regions in lung adenocarcinoma
We first performed an exploratory two-dimensional (2D) hierarchical clustering of the top 5000 probes that varied most across the 117 samples (Fig. 1A). The DNA methylation profiles of tumors and NTL resulted in separate clusters, with the exception of one NTL sample (3022_N), indicating a substantial difference in DNA methylation profiles between the tumor and non-tumor samples. We next performed a locus-by-locus differential DNA methylation analysis of tumors vs. NTL to identify differentially methylated probes. Using our criteria of Q < 0.05 and a minimum median β-value difference of 20%, we identified 681 probes (520 genes) that were significantly hypermethylated in tumors, and 275 probes (247 genes) that were significantly hypomethylated (Fig. 1B; Supplemental Table 1). Some of our most hypermethylated loci include HOX genes, specifically HOXB4, HOXA9, and HOXA7. Seventeen different HOX genes passed our strict cutoffs, many passing with multiple probes, supporting previous observations of widespread DNA methylation of the Polycomb complex–targeted HOX genes (Shiraishi et al. 2002; Rauch et al. 2007). Some of our most hypomethylated loci include CASP8 and TNFRSF10A, both involved in TNF-receptor-mediated apoptosis (Boldin et al. 1996; Wang and El-Deiry 2003). To investigate the categories of genes exhibiting altered DNA methylation, we performed a DAVID functional annotation analysis (Huang da et al. 2009). The differentially hypermethylated set of genes was significantly enriched in GO biological processes including regulation of transcription, embryonic morphogenesis, cell–cell signaling, and cell surface receptor–linked signal transduction, while the differentially hypomethylated set was significantly enriched in processes including epidermal cell differentiation, epithelial cell differentiation, and defense response (Benjamini-Hochberg [BH] adjusted p < 0.05).
Identification of DNA methylation differences between lung adenocarcinoma and NTL. (A) Two-dimensional hierarchical clustering was performed using the 5000 most variable Infinium DNA methylation probes across all samples (n = 117). Probes are in rows; samples are in columns. Note that both hypermethylation and hypomethylation in tumors compared with NTL is seen, and that hypomethylation largely occurs outside of CpG islands. (B) Volcano plot of the differential DNA methylation analysis. (X-axis) Median β-value difference (median tumor-median NTL); (y-axis) Q-values for each probe (−1 × log10 scale). (Vertical dotted lines) 20% change in β-values; (horizontal dotted line) the significance cutoff. One gene, CDH13, showed both significant hypermethylation and hypomethylation (see text and Supplemental Figs. 3 and 4). (C) Proportions of probes from genes with associated CpG islands (CGI) and probe locations, categorized as promoter (±1 kb from TSS) or nonpromoter regions. (D) Overlap of significant unique gene lists using an independent sample set (see also Supplemental Figs. 2 and 5).
For the differentially methylated probes, we also investigated whether or not they corresponded to genes with associated CpG islands and were located in promoters (Fig. 1C). Sixty-four percent of probes hypermethylated in tumors corresponded to genes with associated CpG islands, vs. only 7% of probes hypomethylated in tumors (Fisher p < 2.2 × 10−16). In addition, 77% of probes hypermethylated in tumors were located in promoter regions (defined as the region 1 kb upstream or downstream of the nearest transcription start site), vs. only 68% of hypomethylated probes (Fisher p < 0.005). Our findings support previous observations showing significant differences between the characteristics of genes that gain DNA methylation during tumorigenesis, vs. those that lose DNA methylation (Ohm et al. 2007; Schlesinger et al. 2007). One gene, CDH13—known to be repressed in lung cancer (Sato et al. 1998; Toyooka et al. 2003; Selamat et al. 2011)—was represented by nine probes on this platform and was present in both the hypermethylated and hypomethylated lists (Supplemental Figs. 3, 4). Three CDH13 probes were included in our set of the 5000 most variable probes. Of these, two were statistically significantly hypermethylated in tumors, while one was hypomethylated. The former were both located in the promoter region of the gene, and four additional probes in this region also showed hypermethylation in tumors, although they did not meet the strict criteria for significance. The hypomethylated probe was not located in or near a CpG island and resided in intron 1 of the gene, >10 kb from the transcription start site. The two additional probes in this region showed trends of hypomethylation, although they did not reach statistical significance in the discovery analysis. Thus, the CDH13 gene appears to be characterized by widespread differential and bidirectional changes in DNA methylation in lung adenocarcinoma vs. the adjacent NTL. This exemplifies the recent observation that methylation gain in the promoter can be coupled with methylation loss in the gene body (Berman et al. 2012).
To ensure that our findings were not dependent on the specific population analyzed, we performed an additional differential DNA methylation analysis on an independent sample set of 28 lung adenocarcinomas and 27 NTL (Supplemental Table 2), using the same Infinium platform. By applying identical statistical criteria, we identified 313 significantly hypermethylated genes and 85 significantly hypomethylated genes in this verification set (Supplemental Fig. 5A,B). While the smaller number of genes that passed our criteria is expected due to the smaller sample size of the verification set, we found that 95% of the hypermethylated genes and 80% of hypomethylated genes in the verification set had been identified in our discovery analysis (Fig. 1D), supporting our initial findings. To provide technical validation of our observations, we assessed the DNA methylation levels of 12 genes using the real-time PCR-based MethyLight technique on DNA from 26 tumor/NTL pairs, of which 10 tumors and 13 NTL samples were also part of the Infinium verification data set (Supplemental Fig. 5C). These 12 genes were chosen based on significance, functional relevance, and assay design compatibility. Differences in the observed absolute levels of DNA methylation might be expected due to the more stringent nature of the MethyLight assay, which requires multiple fully methylated CpG dinucleotides in the region covered by the primers and probe. In spite of this, all genes tested except one (SOCS2) that were found to be hypermethylated in the Infinium study were confirmed to be significantly hypermethylated in the verification set tumors. In addition, both hypomethylated genes tested (Supplemental Fig. 5C, right panels, FAM83A and SFN) were confirmed to be statistically significantly hypomethylated in tumors.
Identification of potentially functionally relevant DNA methylation changes in lung adenocarcinoma
To identify those DNA methylation changes with concomitant changes in gene expression, we integrated the gene expression profiles and DNA methylation profiles of the EDRN tumor and NTL tissue samples. We were able to examine gene expression levels for 709 out of 766 of the differentially methylated genes. An exploratory hierarchical clustering of the expression levels of just these 709 genes completely separated out tumors vs. NTL (Fig. 2A). Using a BH-adjusted P-value cutoff of 0.05, 349 genes were differentially expressed. Of these, 164 genes were statistically significantly hypermethylated and down-regulated (23%), while 57 genes (8%) were significantly hypomethylated and up-regulated (Fig. 2B; Supplemental Table 3), suggesting that abnormal DNA methylation might have functional consequences in approximately one-third of the genes showing differential DNA methylation between tumor and NTL. We used Ingenuity Pathways Analysis to investigate which gene networks might be affected by the aberrant DNA methylation of these 221 genes. The top two gene networks identified involved cell differentiation, on the one hand, and MAPK signaling/cell cycle control, on the other (Fig. 2C). Prominent in the first network were phosphoinositide-3-kinase (PIK3) complex members, JUN transcription factors of the AP1 family, transforming growth factor β, and WNT signaling pathway members. Epigenetic interactions with histones H3 and H4 were also seen in the first network. In the second network, genes of the MAPK and FGF families, and CCNA1 and 2 (cyclin A1 and 2) were central. Analysis of the top functional categories of deregulated genes pointed to cellular movement and development, tissue development, and cellular growth and proliferation (Fig. 2D). We then used the NextBio database (http://www.nextbio.com) (Kupershmidt et al. 2010) to identify biosets (uploaded data sets) that were significantly associated with our list of 221 genes for which DNA methylation changes were significantly inversely correlated with changes in expression. We found highly statistically significant overlaps (all p < 1.0 × 10−20) with several previously published gene expression studies comparing lung adenocarcinoma to non-tumor lung (Fig. 2E; Supplemental Fig. 6A–C; Wrage et al. 2009; Hou et al. 2010; Lu et al. 2010). One hundred and seven of the 221 genes were found in all of the top three most correlated biosets (48%), while 184 of the 221 genes (83%) were found in at least one of the top three most correlated biosets. Additionally, our 221 genes were highly correlated with two colorectal cancer DNA methylation studies (Supplemental Fig. 6D,E; Hinoue et al. 2012; YH Kim and YS Kim, unpubl.).
Identification of genes showing coordinately changed DNA methylation and gene expression. (A) Two-dimensional hierarchical clustering with 1061 probes corresponding to 709 genes across all tumors (n = 58) and NTL (n = 58). Rows represent probes; columns are samples. (B) Starburst plot integrating differential DNA methylation and gene expression analyses. (X-axis) DNA methylation Q-values (−1 × log10 scale); (y-axis) BH adjusted P-values (−1 × log10 scale). Indicated are genes that are hypermethylated and down-regulated in tumors (red); hypomethylated and up-regulated in tumors (green); hypermethylated and up-regulated in tumors (blue); or hypomethylated and down-regulated in tumors (orange). (C) Top gene networks identified through integrative pathways analysis of significant DNA methylation changes associated with significant inverse gene expression changes. Indicated are genes that are hypomethylated and up-regulated in tumors (green) or hypermethylated and down-regulated in tumors (red). (Solid lines) Direct interaction; (dashed lines) indirect interaction. (D) The most significantly enriched biological process categories within genes showing significant DNA methylation changes associated with significant inverse gene expression changes. (E) Venn diagram of NextBio analysis showing the overlap of our bioset (genes showing significant DNA methylation changes in conjunction with significant inverse gene expression changes) with the three most highly correlated NextBio biosets (see also Supplemental Fig. 6).
To identify the top changing genes, we applied a twofold cutoff to the average change in gene expression (Fig. 3A), finding 45 genes that were coordinately hypermethylated and down-regulated in tumors, and 16 genes that were coordinately hypomethylated and up-regulated (Tables 2A, 2B). Thus, ∼9% of the genes identified as hypermethylated in lung adenocarcinoma are also down-regulated on average more than twofold in the same tissues, a percentage similar to that found in colorectal cancer (Hinoue et al. 2012). In a more global integration analysis using all genes in common between the DNA methylation and gene expression platforms, we identified these same genes as our top candidates for showing DNA methylation-based deregulation (data not shown). For many of the genes we identified, little to nothing is known about DNA methylation-based deregulation in cancer. In addition to those genes showing inverse relationships between DNA methylation and gene expression changes, five genes were found to be hypermethylated but up-regulated in tumors, while 10 genes were found to be hypomethylated and down-regulated (Supplemental Table 4). We attempted to characterize the different groups of genes by examining whether or not they were associated with CpG islands and/or promoter regions. We found a statistically significant association between group membership and CpG island status (Fisher p < 0.001) (Fig. 3B); genes for which DNA methylation increased were significantly associated with CpG islands, regardless of the direction of the gene expression difference. We found no statistically significant difference between group membership and whether or not probes were located in the promoter region (Fisher p < 0.44) (Fig. 3B). Scatterplots illustrate the negative correlation between DNA methylation and gene expression for select genes as well as the distinct distribution of tumor and NTL sample values (Fig. 3C).
Genes showing the most significant changes in DNA methylation and gene expression. (A) Three-dimensional starburst plot of 709 genes, integrating significant changes in DNA methylation (x-axis) and gene expression (z-axis), with a mean twofold or greater change in gene expression (y-axis). Colors are as in Figure 2B. (B) Presence of CpG islands and probe locations for genes exhibiting hyper- or hypomethylation and up- or down-regulation. (C) Correlation plots of DNA methylation vs. gene expression in tumors and normal tissues for select genes.
Top hypermethylated and down-regulated genes in lung adenocarcinoma
Top hypomethylated and up-regulated genes in lung adenocarcinoma
Examination of the clinical characteristics of the tumors (Table 1) yielded the expected statistically significant correlations of smoking with KRAS mutations and never-smoking with EGFR mutations (Sun et al. 2007). To examine the overall difference in DNA methylation between tissues from smokers and never smokers, we performed a correlation analysis, which showed very similar DNA methylation profiles in both groups, although the correlation for NTL was highest (Fig. 4A,D). We then performed a locus-by-locus differential DNA methylation analysis of smoker vs. never-smoker tissue (tumors as well as NTL) to identify differentially methylated probes. While NTL showed no significant differential DNA methylation between smokers and never-smokers, six genes were statistically significantly different between the tumor tissues of smokers and never-smokers (Fig. 4B,E). IRF8, IHH, LGALS4, IL18BP, and VTN were hypermethylated in current smoker tumors, while KLF11 was hypomethylated (Supplemental Table 5). Only LGALS4 showed a statistically significant corresponding down-regulation in gene expression in current smoker tumors (BH p < 0.0069). A scatterplot of DNA methylation vs. gene expression for LGALS4, which encodes a galactoside-binding protein involved in cell–cell and cell–matrix interactions, demonstrates the dramatic hypermethylation and down-regulation of the gene in current smoker tumors (Fig. 4C).
Identification of DNA methylation differences between lung adenocarcinoma tumors and NTL with regard to smoking. (A) Correlation matrix of median β-values of tumors from current smokers vs. never-smokers, with the Spearman rho correlation given in the top left corner. (B) Volcano plot of the differential DNA methylation analysis between tumors from smokers and never-smokers. (Vertical dotted lines) 20% change in β-values; (horizontal dotted line) the significance cutoff. (C) Correlation plot of DNA methylation vs. expression for LGALS4. The Spearman rho correlation coefficient is provided on top. (D) Correlation matrix of median β-values of NTL from current smokers vs. never-smokers, with the Spearman rho correlation given in the top left corner. (E) Volcano plot of the differential DNA methylation analysis between NTL from smokers and never-smokers. (Vertical dotted lines) 20% change in β-values; (horizontal dotted line) the significance cutoff. No significant DNA methylation differences are seen between NTL from smokers and never-smokers.
One problem with the studied sample collection was a bias in the ethnic composition (Table 1); only one of the 29 smokers was Asian, whereas 21/30 never-smokers were Asian. To address this bias, we performed a multiple linear regression analysis and found that even with adjustment for race, all five genes that were significantly hypermethylated in smokers remained significant. KLF11 hypomethylation, however, did not. We examined LGALS4, the only gene showing significantly altered gene expression, more closely. When we performed a linear regression analysis with adjustment for race as well as KRAS and EGFR status, the correlation between LGALS4 and smoking remained statistically significant. In addition, a comparison of all Asian vs. all Caucasian tumors did not identify LGALS4 as significantly differentially methylated between the two populations. Lastly, in an analysis limited to only Caucasian tumors (nine never-smokers vs. 28 current-smokers), LGALS4 was found to be differentially methylated (Wilcoxon p < 0.009, median β-value difference = 0.22). These observations support an association between LGALS4 hypermethylation in lung adenocarcinoma and smoking.
The limited number of significant differences between current and never-smoker lung adenocarcinomas and their highly similar global DNA methylation profiles (Fig. 4A) prompted further investigation. We examined 30 genes that had previously been reported to show differences between smokers and never-smokers. Of these, 14 genes showed statistically significant differences in DNA methylation, but only one gene, CDKN2A, showed a median β-value difference of >20% between current smokers and never-smokers (Supplemental Table 6). Our results therefore suggest that smoking status did not greatly influence the DNA methylation profiles of the tumors in our collection.
Class discovery: Identification of DNA methylation subgroups in lung adenocarcinoma
We performed an unsupervised analysis of the entire 59-tumor set to identify any intrinsic DNA methylation-based subclasses that could then be investigated in relation to known clinical features. We used either the top 5000 most variable probes within the tumors (Fig. 5A) or the 766 differentially methylated genes (Supplemental Fig. 7A,C). Both hierarchical clustering analyses identified two distinct clusters, with only two tumors changing memberships between the two clustering approaches. Comparison of DNA methylation of the 5000 most variable probes among the tumor tissues showed considerable hypermethylation: 962 probes (753 genes) were significantly hypermethylated in Cluster 1 vs. Cluster 2, and only one gene was significantly more methylated in Cluster 2 vs. Cluster 1 (RUNX1) (Fig. 5B; Supplemental Table 7). We investigated whether this hypermethylation was tumor-specific and how it related to the probes previously identified as differentially methylated between all tumors vs. all NTL (Fig. 5C,D). Two hundred and sixty-two probes overlapped between the two groups, representing CpGs that were differentially methylated between all tumors and NTL, and that showed significant additional hypermethylation in the tumors of Cluster 1. Notably, 694 probes that had been identified as differentially methylated between all tumors and NTL did not overlap with probes differentially methylated between the two clusters. These probes comprised ones hypomethylated in the tumors as well as probes that were hypermethylated in tumors, but not additionally hypermethylated in Cluster 1 (Fig. 5D, blue dots). This suggests that the hypermethylation seen in Cluster 1 is not a global phenomenon but appears to occur in a particular group of genes. Lastly, 701 probes (red dots) were differentially methylated in Cluster 1 vs. 2 but were not identified as differentially methylated between all tumors vs. NTL. These probes were hypermethylated in tumor vs. NTL samples from Cluster 1, but not from Cluster 2 (Fig. 5E). To determine whether the differential DNA methylation of these latter probes might be of functional relevance, we examined the expression levels of these probes in Cluster 1 tumors vs. Cluster 1 NTL, identifying 13 genes that were concordantly down-regulated and hypermethylated, and three genes that were hypermethylated and up-regulated (Supplemental Table 8). Of interest was the hypermethylation and down-regulation of two genes encoding cell adhesion molecules, junctional adhesion molecule 3 (JAM3) and claudin 11 (CLDN11). Hypermethylation-based silencing of CLDN11 has been implicated in gastric cancer metastasis (Agarwal et al. 2009). Also notable was the hypermethylated and down-regulated gene GFRA1, encoding a membrane protein that interacts with the RET tyrosine kinase (Leppanen et al. 2004). Interestingly, chromosomal translocations that activate RET were recently described in lung adenocarcinoma (Ju et al. 2012; Lipson et al. 2012). The significance of GFRA1 deregulation and other Cluster 1–specific gene deregulatory events is worthy of further investigation, because it might point to patient subgroup-specific targets for epigenetic therapy.
Hierarchical clustering of tumors identifies two distinct DNA-methylation based clusters. (A) Two-dimensional hierarchical clustering of the top 5000 most variant probes among 59 tumors. Rows are probes; columns are samples. Fisher P-values for different sample parameters are shown on the left; parameters are indicated at right (the listed characteristic is marked as a black tick mark, except as indicated in the key beside the heatmap). The two main clusters are marked in color at the top of the heatmap. (B) Volcano plot showing statistically significant DNA methylation alterations between the two clusters. (Vertical dotted lines) 20% change in β-values; (horizontal dotted line) the significance cutoff. (C) Overlap of probes hypermethylated in all tumors vs. NTL and probes hypermethylated in Cluster 1 vs. Cluster 2. (D) Correlation matrix of median β-values of tumors vs. NTL for the three color groups indicated in C. Note that probes that were differentially methylated in Cluster 1 but had not been identified as differentially methylated in the all tumor vs. all NTL comparison (red dots) fell near the diagonal. (E) Correlation matrix of median β-values of tumors vs. NTL for probes differentially methylated between clusters but not between all tumors vs. NTL (red dots in C). The distribution seen when these probes are examined in either Cluster 1 (green dots) or Cluster 2 (purple dots) samples. Note that within Cluster 1 samples, these probes are hypermethylated in tumors. (F) Associations of cluster membership with KRAS mutation status and (G) smoking status. (H) Volcano plot showing statistically significant gene expression differences (unrelated to DNA methylation) between the two clusters. (Vertical dotted lines) A twofold change in expression; (horizontal dotted line) the significance cutoff.
Investigating the relationship of the two clusters to their clinical and genetic properties, we found no significant differences in clinical stage, age, gender, race, STK11 or EGFR mutation status, or survival between the two clusters (all p > 0.05) (Fig. 5A; Supplemental Fig. 7B). However, we did observe statistically significant positive associations between Cluster 1 membership and KRAS mutation and smoking status (Fig. 5F,G). These results held true with both hierarchical clustering approaches. We determined that within KRAS mutants, there was no association between cluster membership and smoking (Fisher p = 0.21), whereas within current smokers, there was a significant association between KRAS mutation and Cluster 1 (Fisher p = 0.02), indicating that KRAS rather than smoking is associated with the more heavily methylated cluster. We found no significant association between the types of KRAS mutations and cluster membership (Fisher's p < 0.49), although this analysis was limited by the modest number of samples in this study. To further investigate the relationship between KRAS mutational status and cluster membership, we subdivided Cluster 1 and 2 tumors by KRAS mutational status and compared the median DNA methylation β-values among these groups. The median β-values for the KRAS mutant vs. wild-type tumors was well-correlated in both clusters (Supplemental Fig. 8A,B). In addition, Cluster 1 tumors showed higher median β-values in numerous CpGs irrespective of the KRAS mutational status (Supplemental Fig. 8C,D). These observations suggest that while KRAS mutational status is positively associated with Cluster 1 membership, it is not the driver behind Cluster 1–specific DNA methylation.
We next investigated whether the differential DNA methylation between the two clusters could be due to differential expression of proteins known to influence DNA methylation levels. We found no significant differences in expression levels of DNMT1, DNMT3A, DNMT3B, DNMT3L, TET1, TET2, and TET3 between the two clusters (Supplemental Fig. 9). However, when we examined global gene expression differences between the two clusters, we identified 36 genes that were statistically significantly differentially expressed (unrelated to DNA methylation), with seven genes meeting a twofold cutoff (Fig. 5H; Supplemental Table 9). Of these, three were cytokines (CXCL9, CXCL10, and CXCL14), while another gene, PHLDA1, encodes a regulator of IGF1-mediated apoptosis (Toyoshima et al. 2004).
Discussion
Using genome-level interrogation of DNA methylation, we identified 766 genes showing significant differential DNA methylation in cancer tissues. These genes may be useful to develop biomarkers for diagnostic or prognostic purposes. For example, the comparison of lung adenocarcinoma DNA methylation profiles with those of white blood cells and other types of cancers (e.g., through The Cancer Genome Atlas; http://cancergenome.nih.gov/), could lead to the development of improved DNA methylation-based blood biomarkers specific for lung adenocarcinoma (Esteller et al. 1999; Usadel et al. 2002). To distinguish DNA methylation events of potential functional significance (“driver events”) from those that do not biologically contribute to tumorigenesis (“passenger events”), we integrated the DNA methylation data with gene expression profiles of the same tumors. One hundred and sixty-four genes were concordantly hypermethylated and down-regulated, and 57 genes were concordantly hypomethylated and up-regulated. Ingenuity Pathways Analysis identified two top deregulated networks: one involved in epithelial-to-mesenchymal transition, the other centered on growth factor signal transduction and cell cycle control. More stringent filtering of genes requiring a minimal mean twofold change in expression yielded 45 hypermethylated and down-regulated genes, including novel methylated genes such as ABCA3, an ATP-binding cassette transporter protein with a critical role in lung development and surfactant metabolism in humans (Shulenin et al. 2004) and mice (Cheong et al. 2007). Also of interest are SOX17, a canonical WNT antagonist previously shown to be functionally hypermethylated in breast and colorectal cancers (Zhang et al. 2008; Fu et al. 2010), and TMEM204, a transmembrane protein that plays a role in cell adhesion and is hypermethylated and down-regulated in pancreatic cancer (Shimizu et al. 2011). To our knowledge, this is the first report of the epigenetic deregulation of these and numerous other genes in lung cancer.
The top hypomethylated and up-regulated gene, FAM83A, has been demonstrated previously to be specifically up-regulated in lung cancer, especially in lung adenocarcinomas (Li et al. 2005). Expression of FAM83A has been used to detect circulating cancer cells in the peripheral blood of lung cancer patients (Liu et al. 2008). FAM83A was also shown to be epigenetically regulated in an in vitro model of arsenic-mediated malignant transformation (Jensen et al. 2008). Also of interest are AGR2, KRT8, and SFN. AGR2 is a known proto-oncogene recently confirmed to be overexpressed in lung adenocarcinoma (Pizzi et al. 2012), and has been shown to promote cell proliferation and migration in several different cancers (Ramachandran et al. 2008; Vanderlaag et al. 2010; Park et al. 2011). KRT8, encoding keratin 8, was previously reported to be up-regulated in lung adenocarcinoma (Wikman et al. 2002); and SFN, encoding stratifin, has been reported to be overexpressed in early invasive lung adenocarcinoma (Shiba-Ishii et al. 2011). Investigating the link between loss of methylation and increased expression of these genes will be important, given the increasing use of epigenetic therapies in cancer treatment.
In addition to genes showing an inverse correlation between DNA methylation and expression, we also identified several genes that were coordinately hypermethylated and up-regulated, or hypomethylated and down-regulated. While these two groups of genes do not fit into the classical paradigm of DNA methylation regulation, increasing evidence from recent deep sequencing studies show that DNA methylation regulation may be more complex. The Illumina Infinium HumanMethylation27 platform is based on probe hybridization and single nucleotide extension, and therefore levels of DNA methylation observed with a particular probe are highly dependent on whether the probe is located within or outside of a CpG island and its proximity to the transcription start site of a gene (Brenet et al. 2011; van Vlodrop et al. 2011). Additionally, the identification of long-range DNA methylation, or spreading of DNA methylation (Clark 2007), as well as roles for DNA methylation on the border or just outside CpG islands, termed “shores” and “shelves” (Irizarry et al. 2009), DNA methylation-regulated alternate transcripts (Maunakea et al. 2010), the role of DNA methylation in chromatin arrangement and organization of the genome (Berman et al. 2012), miRNA regulation (Lopez-Serra and Esteller 2012), and even gene silencing by non-CpG island DNA methylation (Han et al. 2011) add increasing intricacy to the relationship between DNA methylation and gene regulation. The new Illumina Infinium HumanMethylation450 bead array, whole-genome bisulfite-sequencing, and RNA-seq will be able to shed more light onto these possibilities (Berman et al. 2012).
Distinct mutational and gene expression differences between lung adenocarcinomas of smokers and never-smokers have been frequently noted (e.g., Belinsky et al. 2002; Toyooka et al. 2003; Pao et al. 2004, 2005b; Sun et al. 2007; Landi et al. 2008). Previous candidate-gene studies (Belinsky et al. 2002; Pulling et al. 2003) were recapitulated in our study albeit with smaller differences. It should be considered that the second most common form of lung cancer, squamous cell carcinoma, occurs predominantly in smokers, while adenocarcinoma is the most common lung cancer histology in never-smokers. Our study focuses exclusively on lung adenocarcinoma, and the modest differences we detect may therefore be due to different histological compositions as well as to differences in methodology. Two recent studies suggest no simple relationship between tobacco smoke carcinogens and DNA methylation. In a study comparing peripheral blood DNA methylation profiles of smokers and nonsmokers, only one differentially methylated locus was found (Breitling et al. 2011). An in vitro study of human lung cells chronically exposed to a tobacco carcinogen also showed little to no effect on DNA methylation profiles in treated vs. untreated cells (Tommasi et al. 2010). In our genome-scale supervised approach, we noted only six significantly differentially methylated genes between smokers and never-smokers, and of these only LGALS4 showed a corresponding down-regulation in gene expression in current smoker tumors. LGALS4 has been implicated in several cancers, including gastric, colon, and sinusoidal adenocarcinomas (Sakakura et al. 2005; Tripodi et al. 2009; Watanabe et al. 2011). Recently, a mechanism for the involvement of LGALS4 as a tumor suppressor in colorectal cancer was proposed involving the WNT signaling pathway (Satelli et al. 2011). This is especially interesting given the well-established role that WNT signaling plays in the development of lung cancer (Mazieres et al. 2005; Nguyen et al. 2009). Importantly, previous studies have suggested that the WNT pathway is involved in cigarette smoke–induced tumorigenesis (Lemjabbar-Alaoui et al. 2006; Hussain et al. 2009). To our knowledge, this is the first report of differential expression of LGALS4 between smoker and never-smoker lung tumors and of DNA methylation as a potential regulator of LGALS4 expression. Functional validation of LGALS4 regulation in lung adenocarcinoma is a highly interesting avenue of future investigation.
Genome-wide DNA methylation profiling has led to increased knowledge of epigenetic subtypes of colorectal cancer, glioblastoma, and multiple myeloma, among others (Noushmehr et al. 2010; Walker et al. 2011; Hinoue et al. 2012). The best-established DNA methylation-based subgroup is that of CpG island methylator phenotype (CIMP), first identified in colorectal cancer (Toyota et al. 1999). CIMP tumors possess high frequency and levels of cancer-specific DNA methylation at loci that show little or no methylation in non-CIMP tumors. CIMP sometimes shows differences in patient survival and is closely associated with BRAF activating mutations, but this does not appear to drive CIMP (Teodoridis et al. 2008). The existence of CIMP has been suggested in NSCLC (Marsit et al. 2006; Suzuki et al. 2006) using a very limited number of genes. However, another study did not support this conclusion (Vaissiere et al. 2009). Our use of 27,578 probes enabled a more thorough examination of DNA methylation, and we find no evidence for classic CIMP in lung cancer. An additional epigenotype, termed CIMP-low, has been reported and confirmed in several independent populations of colorectal tumors using different methodologies (Ogino et al. 2006; Shen et al. 2007; Yagi et al. 2010; Hinoue et al. 2012). CIMP-low exhibits moderately high levels of DNA hypermethylation at a subset of CIMP-associated loci, and in each study was found to be associated with KRAS mutation. Although the current lung adenocarcinoma study is limited by a modest sample size, we too observe an epigenetic subtype of lung adenocarcinoma with higher DNA methylation levels that is associated with KRAS mutation. In 2006, a higher methylation index in KRAS mutant tumors in comparison to KRAS wild-type tumors was described (Toyooka et al. 2006). However, like CIMP-low in colorectal cancer, KRAS mutations appear not to drive this epigenetic subgroup, and a more complex molecular mechanism may cause the observed epigenetic heterogeneity (Hinoue et al. 2012). The hypermethylated cluster was also associated (albeit more weakly) with smoking status, which is not surprising given the fact that KRAS mutations are most common in smokers (Ahrendt et al. 2001).
To further characterize the CIMP-low cluster, we identified genes that show cluster and tumor-specific DNA methylation-based down-regulation, including GFRA1. These genes, which might constitute new DNA-methylation-based deregulated driver genes characteristic for a subset of lung adenocarcinomas, merit further exploration. In addition, we identified seven genes showing statistically significant expression changes of at least twofold, unrelated to DNA methylation. Three out of the seven genes were cytokines recently shown to play a role in tumorigenesis (Andersson et al. 2011), while PHLDA1, a regulator of IGF1-mediated apoptosis (Toyoshima et al. 2004), has recently been suggested to be a putative epithelial stem cell marker in the human intestine (Sakthianandeswaren et al. 2011). The role of cytokines in tumorigenesis is of increasing interest (Dranoff 2004; Li et al. 2011), and the connection between DNA methylation, cytokines, and cancer is an intriguing avenue of investigation (Yoshikawa et al. 2001; Galm et al. 2003; Niwa et al. 2005; Sunaga et al. 2012). While we do not find a survival difference between the two DNA methylation-based clusters, further characterization of the identified molecular differences may lead to the development of new treatment strategies targeted to this subgroup of tumors.
In summary, our DNA methylation profiling of 59 lung adenocarcinomas and matched adjacent non-tumor lung tissue accomplished three goals: (1) the identification of numerous new cancer-specific DNA methylation changes that can be pursued as potential lung adenocarcinoma biomarkers; (2) the identification of potentially functional DNA methylation changes that may constitute driver alterations in lung adenocarcinoma; and (3) the identification of an epigenetic subgroup of lung adenocarcinoma with higher levels of DNA methylation that is reminiscent of CIMP-low in colorectal cancer, is correlated with KRAS mutation, and shows specific gene expression alterations. Our observations lay the groundwork for further diagnostic and mechanistic studies of lung adenocarcinoma that could lead to improvements in detection, patient classification, and therapy.
Methods
Study samples
The Early Detection Research Network (EDRN)/Canary Foundation tissue collection consisted of 60 lung adenocarcinoma tumors and matched adjacent histologically confirmed non-tumor lung (NTL), collected after surgery. Forty-five adenocarcinoma/NTL pairs were obtained from the Vancouver General Hospital (Vancouver, Canada) and 15 adenocarcinoma/NTL pairs from the British Columbia Cancer Agency Tumor Tissue Repository (Vancouver, Canada, BCCA Research Ethics Board #H09-00008). Thirty subjects were never-smokers (defined as less than 100 lifetime cigarettes), and 30 were current smokers (average 53 pack-years, range 11–120 pack-years). One tumor sample was excluded after pathology review later revealed it to be a large cell carcinoma. Subject characteristics for the remaining 59 subjects are detailed in Table 1. For verification of DNA methylation profiling, an independent sample set of 28 lung adenocarcinomas and 27 NTL was used. Subject characteristics for the validation population are detailed in Supplemental Table 2. Of these, 21 tumor and 20 NTL de-identified samples were purchased from the Ontario Tumor Bank (OTB, Ontario, Canada), while seven tumors and seven NTL were collected at the University Hospital at the University of Southern California (USC IRB protocol #HS-06-00447). For MethyLight verification of selected probes, OTB samples were used (26 tumors and 26 NTL), of which 25 pairs were matched. Ten of the tumors and 13 of the NTL samples examined by MethyLight were the same tissues used for the genome-wide verification. EDRN/Canary samples were assessed by an experienced pathologist (A.F.G.), while the verification samples were assessed by a separate expert lung pathologist (M.N.K.). All sample collections were performed conforming to protocols approved by the appropriate local Institutional Review Boards and were acquired with informed consent. The identities of the subjects were not made available to the laboratory investigators.
DNA methylation data production
DNA was extracted by proteinase K digestion following manual microdissection from slides prepared from fresh frozen tissue blocks. The DNA was then bisulfite-converted using the EZ DNA Methylation kit (Zymo Research) with a modification to the manufacturer's protocol in which samples were cycled 16 times for 30 sec at 90°C and 1 h at 50°C. The Illumina Infinium HumanMethylation27 BeadChip assays were performed by the USC Epigenome Center according to the manufacturer's protocols (Illumina). This assay generates DNA methylation data for 27,578 CpG dinucleotides covering 14,473 unique genes. DNA methylation levels are reported as β-values, calculated from mean methylated (M) and unmethylated (U) signal intensities for each locus for each sample using the formula [β = M/(M + U)]. Probes with detection P-values of >0.05 were deemed not significantly different from background noise and were labeled “NA.” Data for all samples are publicly available at the EDRN Public Portal (http://www.cancer.gov/edrn). MethyLight experiments were performed as previously described (Campan et al. 2009). MethyLight measurements are represented as percentage methylated reference (PMR), defined by the GENE:ALU ratio of a sample, wherein ALU refers to a reference primer/probe combination that lacks CpGs and is designed to bind to a subset of ALU repeat sequences (Weisenberger et al. 2006), divided by the GENE:ALU ratio of M.SssI-treated reference DNA. This results in a PMR range of 0–100, where a PMR of 0 indicates no detectable DNA methylation and a PMR of 100 represents fully methylated molecules. Occasionally, a PMR of >100 can be observed, which may result when the reference DNA was not fully methylated at a particular site. To minimize this, the same batch of reference DNA that has been exhaustively enzymatically methylated is used throughout the experiments (Selamat et al. 2011). Previously published MethyLight primer/probe sets were used for the SFN and MAL locus and the ALU reference (Weisenberger et al. 2006; Noushmehr et al. 2010). Other MethyLight primer/probe sequences are detailed in Supplemental Table 10.
DNA methylation data analysis
Data analyses were performed using R (R Development Core Team 2011) and Bioconductor (Gentleman et al. 2004). The analyses of 120 tissue samples necessitated conducting the experiment with the samples randomized and spread over two bisulfite treatment plates and 16 Infinium BeadChips. Batch effect investigations were performed as recommended (Leek et al. 2010) and are illustrated in Supplemental Figure 1. Three tissue samples were excluded from analyses: one tumor/NTL tissue pair (07L36_T/N) found to be a large cell carcinoma instead of a lung adenocarcinoma and one NTL sample (3023_N) for which correlation analyses suggested this was neither a lung adenocarcinoma nor NTL tissue. Probes targeting the X and Y chromosomes were excluded, as were probes containing a known single-nucleotide polymorphism, probes that contain repeat sequences of ≥10 bp, and probes that were found to be non-unique in the genome (Noushmehr et al. 2010). Hierarchical clustering was performed using Ward linkage with Euclidean distance for samples and Pearson correlation coefficients for probes. For each comparison analysis, the top 5000 most variable probes across all samples included in the comparison as measured by SD/SDMAX were retained (Cancer Genome Atlas Research Network 2011). Locus-by-locus analyses were conducted using the nonparametric Wilcoxon rank-sum test, and multiple comparisons correction was performed using Q-values from the qvalue package in R (Storey and Tibshirani 2003). Probes were considered statistically significantly different between the tested groups with a Q < 0.05. We also included an additional filter requiring the median β-value difference between groups to be ≥0.2, or a minimum 20% difference for all group comparisons. MethyLight data were analyzed using Wilcoxon signed-rank tests on GraphPad Prism version 5.00 (GraphPad Software).
Functional classification/gene network analyses
Differentially methylated genes were analyzed for Gene Ontology enrichments in comparison to all genes available on the Illumina Infinium HumanMethylation27 platform using the DAVID Functional Annotation Tool (Huang da et al. 2009). Genes for which expression levels change in concordance with DNA methylation changes were analyzed for gene network and biological processes enrichment using IPA (Ingenuity Systems; http://www.ingenuity.com). Meta-analyses to identify correlated biosets and overlapping genes in publicly available data sets were performed using the NextBio online database (NextBio, Cupertino, CA; http://www.nextbio.com). Accession numbers are GSE10799, GSE17648, GSE19188, GSE19804, and GSE25062.
Integration of gene expression analysis
Gene expression profiles were generated using RNA obtained by TRIzol extraction (Invitrogen) from microdissected alternate sections of the same 59 EDRN lung AD/NTL frozen tissue pairs used for the DNA methylation analysis. Expression data were obtained using the Illumina Human WG-6 v3.0 Expression BeadChips (Illumina) at the Genomics Core at UT Southwestern. Bead-summarized data were obtained using the Illumina BeadStudio software, expression values were log2-transformed, and Robust Spline Normalization (RSN) was performed with the lumi package in R (Du et al. 2008). The ReMOAT annotation of gene expression data was used to include only “perfect” and “good” annotated probes (Barbosa-Morais et al. 2010). Exploratory quality-control analyses revealed no strong batch effects (data not shown), although one tumor sample was excluded (3035_T) due to quality concerns. Out of 766 differentially methylated genes identified (Q < 0.05, median β-value difference ≥0.2), we were able to examine gene expression levels for 709 genes after the probe quality filtering detailed above. For genes with multiple probes, we selected the probe with the highest variance and analyzed differential expression using t-tests and a Benjamini-Hochberg (BH) multiple comparisons correction. Statistical significance was called at BH-adjusted p < 0.05. An additional stringent filter of mean twofold change was used to identify top changing genes. Correlation between gene expression and DNA methylation for each gene was measured using the Spearman correlation coefficient. For genes with multiple probes measuring DNA methylation, we selected the probe with the highest SD/SDmax value for DNA methylation.
Data access
The DNA methylation and expression data generated for the study have been submitted to the NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE32867. All genome-scale data generated in the study are also publicly available at the EDRN Public Portal (http://edrn.nci.nih.gov/).
Acknowledgments
This work was supported by the Canary and Labrecque Foundations in partnership with the Early Detection Research Network (EDRN) of the National Cancer Institute/National Institutes of Health (U01 CA086402), with additional support from NIH/NCI grants R01 CA119029 and CA120869 to I.A.L.O., and the Norris Comprehensive Cancer Center core grant NCI CCSG 5P30 CA014089. We thank all of the members of the Canary/EDRN lung team for fruitful discussions; members of the Laird-Offringa laboratory for helpful criticism; Crystal Marconett for comments on the manuscript; Jessica Nishiguchi for manuscript comments and help researching the aberrantly methylated genes; Daniel Weisenberger from the USC Epigenome Center for help with sample processing; and Toshinori Hinoue, Tim Triche Jr., Houtan Noushmehr, and Hui Shen for data analysis advice.
Footnotes
-
↵7 Corresponding author
E-mail ilaird{at}usc.edu
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.132662.111.
- Received September 30, 2011.
- Accepted April 9, 2012.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.
















