Principled multi-omic analysis reveals gene regulatory mechanisms of phenotype variation

Recent studies have analyzed large-scale data sets of gene expression to identify genes associated with interindividual variation in phenotypes ranging from cancer subtypes to drug sensitivity, promising new avenues of research in personalized medicine. However, gene expression data alone is limited in its ability to reveal cis-regulatory mechanisms underlying phenotypic differences. In this study, we develop a new probabilistic model, called pGENMi, that integrates multi-omic data to investigate the transcriptional regulatory mechanisms underlying interindividual variation of a specific phenotype—that of cell line response to cytotoxic treatment. In particular, pGENMi simultaneously analyzes genotype, DNA methylation, gene expression, and transcription factor (TF)-DNA binding data, along with phenotypic measurements, to identify TFs regulating the phenotype. It does so by combining statistical information about expression quantitative trait loci (eQTLs) and expression-correlated methylation marks (eQTMs) located within TF binding sites, as well as observed correlations between gene expression and phenotype variation. Application of pGENMi to data from a panel of lymphoblastoid cell lines treated with 24 drugs, in conjunction with ENCODE TF ChIP data, yielded a number of known as well as novel (TF, Drug) associations. Experimental validations by TF knockdown confirmed 41% of the predicted and tested associations, compared to a 12% confirmation rate of tested nonassociations (controls). An extensive literature survey also corroborated 62% of the predicted associations above a stringent threshold. Moreover, associations predicted only when combining eQTL and eQTM data showed higher precision compared to an eQTL-only or eQTM-only analysis using pGENMi, further demonstrating the value of multi-omic integrative analysis.

eQTL+M with an LLR >= 4.5 ranked by strength of evidence and LLR score.
For each (TF, Drug) association, we report several fields. The first is the "Validation Status" field, which indicates whether we consider this association to be direct validation (exhibited usually by TF knockdown) or indirect (exhibited usually by TF mRNA differential expression or binding in response to the drug treatment). The second field, "Evidence", summarizes the evidence we found for the association. The next field, "LLR", contains the pGENMi eQTL+M LLR for the association. The following "High Scoring Analyses" field contains the experiments that yielded a significant LLR; for eQTL and eQTM, the LLR threshold was 1.74. Since we only examine significant eQTL+M pairs in this analysis, every association should at least have eQTL+M in this field. The last field is "Observation" which is a paragraph describing literature evidence we found corroborating the (TF, Drug) association or at least providing some evidence as to its validity. and showed resistance to temozolomide treatment. Knockdown of FOXM1 via siRNA assays inhibited RAD51 expression and sensitized recurrent GBM to temozolomide. The regulatory relationship between FOXM1 and RAD51 was further corroborated by ChIP analysis showing a preponderance of FOXM1 binding in the RAD51 promoter ). Observation: Yeast 2 hybrid experiments showed an interaction between MEF2C and HABP4; further investigation showed that HABP4 inhibits the DNA binding potential of MEF2C. This interaction was confirmed in-vitro using GST-pull down assays and in-vivo rat heart cells by ChIP (Kobarg et al. 2005). A separate study on hepatotoxicity demonstrated that acetaminophen treatment affected HABP4 expression (Beyer et al. 2007). Given the interaction between MEF2C and HABP4 and HABP4's sensitivity to acetaminophen, we concluded that though there may be an association between MEF2C and NAPQI, we did not have sufficient experimental evidence to call the relationship direct or indirect.

Validation Status: Indirect
Evidence: Differential Expression of TF mRNA LLR: 7.08, 5.67, and 4.80 respectively High Scoring Analyses: eQTL+M for all, eQTL for NFIC, and eQTM for CCNT2 and UBTF Observation: A multi-center study of the effect of acetaminophen (APAP) toxicity to liver cells showed differential expression of the TFs NFIC and UBTF to acetaminophen treatment (Beyer et al. 2007). In a separate gene expression analysis of human liver slices, treatment of acetaminophen also induced differential expression NFIC and CCNT2 (Elferink et al. 2011). The association between NFIC and APAP toxicity was also established in mouse liver samples (Moffit et al. 2007). Finally, a transcriptomic, proteomic, and metabolomics profile of hepatoma cells with and without APAP demonstrated the differential CCNT2 expression (Prot et al. 2012).

Validation Status: Direct
Evidence: TF siRNA and Differential TF Concentration LLR: 6.95

High Scoring Analyses: eQTL+M, eQTL
Observation: In a study exploring the immune response of oxaliplatin treatment, researchers showed that oxaliplatin decreased TLR-induced STAT1 and STAT3 expression in human T cells via Western Blotting (Tel et al. 2012). In mouse models of metastatic colorectal cancer (HCT116), siRNA silencing of STAT3 combined with oxaliplatin therapy reduced tumor size by 96%, better than either treatment separately (77% and 57% respectively) (Shahzad et al. 2011). This interaction between oxaliplatin and STAT3 was articulated in another study of HCT116 cells, where in vitro treatment of oxaliplatin was accompanied with IL-6 mediated activation of STAT3 as well as phosphorylation of Raf kinase inhibitor protein (RKIP) (Cross-Knorr et al. 2013). Work by Hua et al. in SKOV3 ovarian cancer cell lines recapitulated this relationship in a different cell line, with the conclusion being that oxaliplatin treatment upregulated STAT3β (Sheng et al. 2013).

Validation Status: Direct
Evidence: TF siRNA and Differential Expression of TF mRNA Expression LLR: 6.52

High Scoring Analyses: eQTL+M, eQTL
Observation: A previous study demonstrated that the DNA -methylating drug temozolomide (temozolomide) activates the positive modulator of NF -κB, AKT, in a mismatch repair (MMR) system dependent manner. Mismatch repair systems are either proficient in the repair of DNA double strands breaks (DSBs), or deficient, conferring chemosensitivity to the former and chemoresistance to the latter. A subsequent investigation into whether NF -κB is activated by temozolomide and whether AKT is involved in the molecular biology of this event revealed several interactions between the NF -B family member, RELA, and temozolomide. Treatment of temozolomide to proficient MMR systems enhanced NF -B transcriptional activity, activated AKT, and induced RELA nuclear translocation in only MMRproficient cells. Upregulation of NF -κB transcription and RELA translocation were impaired in KD12 cells treated with temozolomide and transfected with siRNA targeting AKT.
Additionally, RELA silencing in deficient MMR systems increased temozolomide-induced growth suppression (Caporali et al. 2012). An examination of temozolomide inhibition of NF -B activity revealed that O6-methylguanine inhibits RELA DNA binding; Another study showed differential RELA expression in response to temozolomide treatment in glioma cells (Yamini et al. 2007).

High Scoring Analyses: eQTL+M and eQTL for both
Observation: Our previous work demonstrated the link between the anthracyclines, epirubicin and doxorubicin, with HNF4G by using siRNA knockdown of HNF4G in two triple negative breast cancer cell lines: MDA-MB-231 and BTF459. The cytotoxicity curve of cell population survivability versus the log concentration of epirubicin or doxorubicin treated was significantly altered in both MDA-MB-231 and BTF459 when transfected with HNF4G siRNA (Hanson et al. 2015) and compared to DMSO control experiments.

Validation Status: Indirect
Evidence: Differential TF DNA Binding LLR: 5.78 High Scoring Analyses: eQTL+M, eQTM Observation: Research on the involvement of phosphoinosite 3-kinases (PI3K) in cellular differentiation was investigated in the context of friend murine erythroleukaemia cells. The early hours of dimethyl sulfoxide (DMSO) or hexamethylenebisacetamide (HMBA) exposure to these cells commits them to a cessation of growth and differentiation. Treatment of these inducers to friend erythroleukaemia cells increased DNA binding of GATA1, an important transcription factor for erythroid specific genes. When treated with the S6-kinase inhibitor, rapamycin, HMBA cells induced at 18 hours showed markedly lower binding of GATA1 to the DNA (Bavelloni et al. 2000). Together, this indicated that rapamycin may inhibit binding of GATA1 to DNA via the PI3K dependent AKT/p70 S6-kinase pathway. Another study concluded that mammalian target of rapamycin (mTOR), which is related to the aforementioned AKT pathway, tightly regulates GATA1 protein expression at the posttranscriptional level . Silencing BATF shifted both the cytotoxicity curves of paclitaxel and docetaxel in MDA-MB-231 significantly with respect to DMSO controls, but not in BTF549 (Hanson et al. 2015).  (James et al. 2015). NANOG protein concentration was also shown to be decreased in colorectal cancer stem cells (CRSCs) when treated with thiostrepton, which acts synergistically with oxaliplatin in killing CRSCs (Ju et al. 2015). Additionally, research has shown that NANOG amplifies STAT3 activation, a regulator shown to be associated with oxaliplatin cellular response (Stuart et al. 2014) (Cross-Knorr et al. 2013). Finally, the Notch signaling pathway, which targets NANOG (Capaccione and Pine 2013), was shown to be an important mediator of CRSC self-renewal and proliferation (http://ecommons.luc.edu/luc_diss/87).

Validation Status: Indirect
Evidence: Differential Expression of TF mRNA LLR: 5.00

High Scoring Analyses: eQTL+M, eQTL
Observation: While no literature evidence was found that corroborates this finding, evidence does exist for the association between gemcitabine, which, like Ara-C, is a deoxycytidine analog. The link between gemcitabine and ZNF274 was reported by eQTL+M analysis with an LLR of 3.11. A gene expression profile of gemcitabine treatment to breast cancer cells revealed that gemcitabine upregulates ZNF274 expression (Hernandez-Vargas et al. 2007). Further analysis revealed EZH2 overexpression in these cells. EZH2 is a known transcriptional silencer via H3K27me3 and so the investigators knocked down EZH2 via shRNA to assess its impact on FHIT. Knockdown of EZH2 increased FHIT expression, decreased H3K27me3 in the FHIT promoter by 2-fold, increased H3K4me3 in the FHIT promoter by 2-fold, and reduced FHIT promoter methylation by 10%. An inhibitor of EZH2 H3K27me3, GSK343, resulted in increased expression of FHIT; combination therapy with the DNA methyltransferase inhibitor (DNMT) 5-Aza demonstrated the greatest increase of FHIT expression among all epigenetic silencers (Lin et al. 2015). In a study of glioblastoma (GBM) -derived tumorigenic stem-like cells (GSCs), Kim et al. looked at the role that the catalytic subunit of Polycomb repressive complex 2, EZH2, and the MELK-FOXM1 complex play in radiosensitivity. It was shown that not only are both EZH2 and MELK co-expressed in GBM and upregulated following radiation treatment, but that MELK mediated EZH2 signaling is required for GSC resistance to radiation. They further show that this function is evolutionarily conserved in Caenorhabditis elegans (C. elegans). The researchers further detailed the mechanisms by which EZH2, MELK, and FOXM1 may be interacting. Luciferase assays in overexpression and knockout experiments showed increased EZH2 promoter activity in the presence of MELK; this finding was corroborated by flow cytometry experiments. The conclusion of these findings was that MELK transcriptionally regulated EZH2, at least in GBM spheres. Given the lack of a MELK DNA binding domain, the researchers searched for a cofactor that MELK teamed with to regulate the expression of EZH2. The paper concluded that EZH2 is a direct target of the MELK/FOXM1 transcriptional complex (Kim et al. 2015).  (Du et al. 2012).

Supplemental Note S2:
Here, we list and discuss the top (TF, Drug) associations predicted by eQTL+M with an LLR >= 3, decomposed by treatment, and ranked by LLR. For each treatment, only up to seven TFs were examined. In parenthesis is the fraction of pairs validated for each drug, where validation is either direct (knockdown or overexpression) or indirect (differential expression, differential binding, etc…). Observations are also provided for pairs where we observed some evidence, but not enough to be considered either direct or indirect validation.
For each (TF, Drug) association, we report several fields. The first is the "Validation Status" field, which indicates whether we consider this association to be direct validation (exhibited usually by TF knockdown) or indirect (exhibited usually by TF mRNA differential expression or binding in response to the treatment). The second field, "Evidence", summarizes the evidence we found for the association. The next field, "LLR", contains the pGENMi eQTL+M LLR for the association. The following "High Scoring Analyses" field contains the experiments that yielded a significant LLR; for eQTL and eQTM, the LLR threshold was 1.74. Since we only examine significant eQTL+M pairs in this analysis, every association should at least have eQTL+M in this field. The last field is "Observation" which is a paragraph describing literature evidence we found corroborating the (TF, Drug) association or at least providing some evidence as to its validity. Observation: Though we failed to corroborate this finding in the literature, we have found NANOG to be associated with at least 4 other drugs. Knockdown of NANOG has been shown to increase sensitivity in cancer cells to cytotoxic agents like cisplatin (Du et al. 2012) and reduce malignancy potential (Kawamura et al. 2015). Overexpression of NANOG has also been shown to increase resistance to docetaxel (Jeter et al. 2011). Thus, while we failed to corroborate this association, NANOG appears to be important in cancer and drug resistance more broadly.

High Scoring Analyses: eQTL+M
Observation: TCF7L2 is known to be involved in drug resistance (Nishimoto et al. 2014). In investigating the effect of NR4A3 variant (coding SNP rs12686676) in insulin gene regulation and insulin secretion in -cells, researchers found a gene-gene interaction between the NR4A3 allele and a variant of TCF7L2 (SNP rs7903146). The NR4A3 allele is known to increase insulin secretion, while the TCF7L2 allele decreases secretion. Haplotypes with wild type TCF7L2 and variant NRFA3 do not exhibit significant increase or decrease in insulin secretion compared to haplotypes that are wildtype for both genes. However, haplotypes with variant TCF7L2 and variant NR4A3 show much higher secretion of insulin compared to haplotypes with wildtype NR4A3 and variant TCF7L2, indicating that the NR4A3 can restore insulin secretion in comprised systems with variant TCF7L2. It is widely known that the NR4A subgroup can be activated by 6-MP via the AF-1 domain (Ordelheide et al. 2013). Given the interaction between TCF7L2 and NR4A3 as well as NR4A3's activation by 6-MP, we conclude that there is some evidence that TCF7L2 is associated with 6-MP, but not enough to warrant indirect or direct validation status. In a separate study examining the effect of azathioprine, a purine anti-metabolite akin to 6-MP, on carcinogenesis in mice found frameshift mutations in TCF7L2 and seven other genes involved in carcinogenesis when at least one copy of the DNA repair protein MSH2 in the mice was absent (Chalastanis et al. 2010

High Scoring Analyses: eQTL+M
Observation: In our study, siRNA knockdown of RAD21 in Jurkat cell lines shifted the cytotoxicity curve of Cytarabine significantly compared to DMSO controls. Additionally, somatic mutations in RAD21 have been identified as a key driver in acute myeloid leukemia (AML) development; treatment of cytarabine and indarubicin has been shown to achieve remission in patients with cohesin mutations in STAG2, SMC3, and RAD21 (Thota et al. 2014).

Validation Status: Indirect
Evidence: Differential Phosphorylation of TF by Gemcitabine LLR: 3.44

High Scoring Analyses: eQTL+M, eQTL
Observation: TRIM28 is an ataxia telangiectasia mutated (ATM) substrate activated by DNA double strand breaks. In exposing multiple myeloma cells with a combination of gemcitabine (a pyrimidine analog akin to cytarabine) and a purine analog, clofarabine, ATM kinase substrates such as histone 2AX, TRIM28, and p53 were phosphorylated (Valdez et al. 2013), indicating that the ATM pathway (including TRIM28) is activated by this combination. Another study demonstrated that depletion of ATM in HeLa and A549 cells sensitized them to gemcitabine therapy -further supporting the importance of this pathway in gemcitabine treatment (Karnitz et al. 2005).  phosphorylation (Chen et al. 2013a). With respect to cell cycle arrest generally, EZH2 has also been implicated as a significant player in SWI/SNF, a family of proteins that consume ATP to remodel nucleosomes, deficient cells; this was shown via inhibition of EZH2 in SMARCB1 (a member of the SWI/SNF family) deficient rhabdoid cells, which led to alterations of H3K27 trimethylation and cytotoxicity (Masliah-Planchon et al. 2015) Carboplatin: (1/1)

Validation Status: Direct
Evidence: ChIP and TF Overexpression LLR: 3.85

High Scoring Analyses: eQTL+M, eQTL
Observation: It has been established via chromatin immunoprecipitation (ChIP) assays that, in tumor copper deficient cells, SP1 regulates SLC31A1 by binding to the SLC31A1 promoter.
The SLC31A1 protein is a membrane protein that transports copper into the cell, and, more importantly, platinum drugs such as cisplatin and carboplatin. Low expressing individuals of SLC31A1 are typically resistant to the chemotherapy of platinum drugs. It has been shown that SLC31A1 can be induced in copper deficient environments through upregulation of SP1, thereby leading to greater chemosensitivity to platinum agents. Such experiments have already been performed with cisplatin and preliminary studies are encouraging for carboplatin therapy (Chen and Kuo 2013). Additionally, a separate study showed that inhibition of SP1 binding to the promoter of BIRC5 (survivin), an antiapoptotic protein highly expressed in cancer and linked to drug resistance, decreased expression of BIRC5 (Chun et al. 2007), implicating SP1 more broadly in chemotherapy.
Cisplatin: (7/7)  (Campbell et al. 2006). Another study examining the interaction of RELA and cisplatin focused on the differential phosphorylation of the T505 residue of RELA by CHEK1 in response to cisplatin treatment in mouse embryonic fibroblasts. The researchers found that T505 phosphorylation induced a proapoptotic form of RELA that could facilitate cell death through transcriptional repression of antiapoptotic target genes such as BCL2L1 (Msaki et al. 2011). Observation: Examining the role of RUNX3 in gastric cancer chemotherapy, siRNA knockdown of RUNX3 sensitized immortalized stomach mucosal cells (GES-1) and gastric cancer cells (SGC7901) to cisplatin (Guo et al. 2005). Another study in hepatocellular carcinoma (HCC) looking at the role of miR-130a/RUNX3/WNT signaling in cisplatin treatment demonstrated through siRNA knockdown of miR-130a and RUNX3 that miR-130a inhibits RUNX3 which activates WNT signaling and results in cellular resistance to cisplatin treatment . Further studies have shown differential expression of RUNX3 in cells treated with cisplatin (Biswal et al. 2012) (Dadarkar et al. 2010 Observation: In an investigation of ZRANB3, siRNA knockout in U2OS cells resulted in dramatic sensitization to camptothecin (CPT) and moderate sensitization to cisplatin. The study also elucidated cooperativity between ZRNAB3 and WRNIP1 (Ciccia et al. 2012). In a study of chicken DT40 cells, knockdowns of WRNIP1 did not desensitize the cell to cisplatin, but moderately to CPT; however, negative cells of RAD18, which interacts with WRNIP1, did exhibit high sensitivity to both cisplatin and CPT (Yoshimura et al. 2006).  (Low et al. 2010). However, in a transcriptomic profile of human cellular response to cytotoxic agents, NR3C1 was found to be downregulated by cisplatin treatment (Limonciel et al. 2015). This contradiction may be due to the differing regulatory architectures between the species, as it is likely cisplatin does not interact directly with NR3C1 Additionally, NR3C1 is a glucocorticoid receptor (GR) and glucocorticoids (GC) have been shown to be enhance cytotoxicity (Lu et al. 2006 Observation: In analysis of human urothelial carcinoma cell line NTUB1, researchers examined the following cell sublines resistant to particular drugs to analyze the differences in CEBPD expression: cisplatin (NTUB1/P(14)), gemcitabine (NTUB1/G(1.5)), arsenic trioxide (NTUB1/As(0.5)), and paclitaxel (NTUB1/T(0.017)). CEBPD was only expressed in the cisplatin resistant cell subline, NTUB1/P(14), and in none of the parental cell lines. When treating NTUB1 cells with cisplatin, CEBPD protein and mRNA expression levels were unilaterally elevated across the cell lines. In determining its role in chemoresistance, CEBPD was overexpressed in NTUB1 cells treated with increasing amounts of cisplatin. Compared to controls, CEBPD overexpressed cells were significantly more resistant to cisplatin-induced apoptosis. The paper further articulates that CEBPD reduces cisplatin-induced reactive oxygen species (ROS) production by inducing the expression of Cu/Zn-superoxide dismutase (SOD1) (Hour et al. 2010). In a study of ototoxicity in mouse cells, CEBPD expression was also affected by administration of cisplatin (Low et al. 2010). Another study in human found that cisplatin upregulated CEBPD (Tardito et al. 2009), which is consistent with Hour et al.  In a study profiling gene expression in 30 different cell lines treated with 11 different drugs, HMGN3 expression level was predictive of doxorubicin sensitivity across cells (Gyorffy et al. 2006 (Bernard et al. 1998). While doxorubicin is not epirubicin, the two are very similar in structure, belonging to the same drug class (anthracyclines). Additionally, the eQTL+M (TAL1, Doxorubicin) association yielded an LLR score of 2.81, which barely missed our threshold of 3. Therefore, we feel confident generalizing TAL1's association with doxorubicin to epirubicin.

Association: ZNF143 with Gemcitabine
Validation Status: Not Validated (Some Evidence) Evidence: Differential Expression of TF mRNA and TF siRNA (with Cisplatin) LLR: 3.37

High Scoring Analyses: eQTL+M, eQTL
Observations: In looking at respiratory deficient mitochondrial cells, it was shown that cell lines with respiratory dysfunction were more resistant to death via gemcitabine treatment than their normal counterparts. It was also shown that such cells had higher ZNF143 mRNA levels compared to normal respiratory cells. While gemcitabine treatment with ZNF143 knockdown was not reported, dysfunctional cells showed greater sensitivity to cisplatin after ZNF143 was knocked down (Lu et al. 2012). Thus, some evidence exists that ZNF143 can elicit chemoresistance under certain circumstances.

Association: USF1 with Gemcitabine
Validation Status: Not Validated (Some Evidence) Evidence: Differential Expression of Drug Target and Regulatory Information LLR: 3.24

High Scoring Analyses: eQTL+M
Observation: Both gemcitabine and ara-C rely on deoxycytidine kinase (DCK) in the first rate limiting steps of the activation of these nucleoside agents in solid tumors and leukemia, respectively (Ge et al. 2005). In vivo ChIP assays of HepG2 cells showed the presence of USF1/2 and SP1/2 bound factors to the DCK promoter. Co-transfections in HepG2 showed activation properties of USF1/2 binding and repressive properties of SP1 binding to a DCKluciferase reporter construct (Ge et al. 2003). A separate study showed that DCK expression was predictive of ara-C IC50 in acute myeloid leukemia (AML) cell lines; among the aforementioned regulators of DCK, only USF1 expression was variable and correlated with DCK (Ge et al. 2005). Given that USF1 has been shown to regulate a gene important in the pharmacokinetics of both gemcitabine and cytarabine and is predictive of cytarabine IC50 in AML cell lines, we conclude that there is some evidence to suggest USF1 is associated with gemcitabine. Observation: PRDM1 is one of two major transcription factors critical for XBP1 expression; it does so by repressing PAX5, itself a repressor of XBP1, thereby enhancing XBP1 expression (He et al. 2010). As it turns out, XBP1 is essential for hypoxia survival and is required for tumor growth (Romero-Ramirez et al. 2004). Though not TF siRNA evidence, we consider regulation of a gene essential for treatment survival to be direct evidence of PRDM1's association with Hypoxia. In multiple myeloma (MM) cells, researchers found that hypoxia induces the downregulation of plasma specific TFs and upregulated stem-cell associated TFs.

Association: ZNF274 with Gemcitabine
Among those TFs downregulated in hypoxic MM cells compared to normoxic MM cells was PRMD1 (Kawano et al. 2013). Paradoxically, transcriptomic profiling in other cells in hypoxic and normoxic conditions revealed upregulation of PRDM1 mRNA in response to oxygen deprivation (Limonciel et al. 2015) (Fiedler et al. 2015). This suggests PRDM1 is affected differently depending on the pathways induced by hypoxia. Observation: In a gene expression analysis of precision cut human liver slices, treatment of APAP-induced upregulation of BCLAF1 (Elferink et al. 2011). BCLAF1 has also been shown to promote cell death generally (Kasof et al. 1999 (Hendriks et al. 2012) and mRNA production (when used in concert with piroxicam) (Baldi et al. 2011) of TRIM28.

NAPQI: (4/7)
One prominent study on three non-small cell lung cancer (NSLC) cell lines transformed into tumor-initiating cells (TICs) via stem cell media found impaired phosphorylation of TRIM28 due to irradiation and cisplatin treatment; the researchers hypothesized that the inhibition of TRIM28 phosphorylation might provide a mechanistic explanation for the observed reduction in DNA damage-induced cell cycle arrest and apoptosis in NSCLC TICs (Lundholm et al. 2013). This speaks more broadly to the phosphorylation of TRIM28 desensitizing DNA damaged cells to death rather than to its association with a specific drug. Another very relevant study elucidating the connection between ataxia telangiectasia and Rad3-related (ATR) protein inhibitors and cisplatin demonstrated increased phosphorylation due to mediation by ATR; the same cells sensitized by the ATR inhibitor combined with cisplatin were also sensitized by the ATR inhibitor and oxaliplatin, though to a lesser degree (Hall et al. 2014). However, S-phase cell arrest via ATR appeared to be sensitive to cisplatin, and not oxaliplatin (Lewis et al. 2009), limiting the generalizability of the association of discussion to oxaliplatin. Mechanistically, though, it is still plausible for there to be an association. TRIM28 promotes re-sectioning of DNA double strand breaks not protected by -H2AX (Tubbs et al. 2014) in murine G1-phase lymphocytes and oxaliplatin has been shown to induce -H2AX (Chiu et al. 2009). In terms of chemotherapeutic drugs more generally, a study found that SKOV3 cells overexpressed with TRIM28 showed increased resistance to cisplatin and paclitaxel (Hu et al. 2015), despite the notable differences in mechanism of action between platinum agents and taxane therapies. Thus, while there is sufficient evidence between a TRIM28 and cisplatin association, there are mechanistic reasons to believe it can be generalized to oxaliplatin, a drug of the same family as cisplatin. As for why TRIM28 was not reported as an association with cisplatin, the best score (2.44) didn't make the cutoff threshold of 3 in the eQTL+M analysis.

Validation Status: Indirect
Evidence: Differential Expression of TF mRNA and Network Analysis LLR: 3.47

High Scoring Analyses: eQTL+M, eQTL
Observation: An experiment involving the co-treatment of oxaliplatin with topotecan in bone marrow from rats showed evidence of upregulation of FOSL1 when treated with this cocktail (Davis et al. 2015). In addition, FOS signaling has been shown to be activated by oxaliplatin treatment in a variety of cancers (Alian et al. 2012). In our analysis, FOS association with oxaliplatin was also noteworthy with a score of 3.61, but it was not in the top 7 associated TFs with oxaliplatin.  (Tang et al. 2007). ZEB1 has also been shown to increase tumorigenicity by repressing stemness-inhibiting miRNAs like miR-203 (Wellner et al. 2009), which has been shown to increase oxaliplatin resistance in colorectal cancer cells (Zhou et al. 2014).

Association: ATF1 with Oxaliplatin
Validation Status: Not Validated Observation: Chronic myelogenous leukemia (CML) resulting from a t(9,22) translocation is resistant to paclitaxel treatment. Researchers found that this produced an oncoprotein that activated a GATA1 response element in the promoter of a heat shock protein, HSP70. The siRNA knockdown of HSP7 0sensitized the cell to paclitaxel (Ray et al. 2004). Additionally, a murine study of the timeline of paclitaxel-induced cellular changes exhibited upregulation of GATA1 expression (Aguirre et al. 2010).  (Russo et al. 2006).

High Scoring Analyses: eQTL+M, eQTL
Observation: In a study examining the molecular mechanisms underlying the transformation of immortalized cells into tumorigenic cells, researchers found that ESR1, while low expressed, was differentially expressed between immortalized mammary epithelial cells and those induced into tumorigenesis via heavy-ion radiation (Ma et al. 2012). A study examining the bystander effect, where cells respond to their neighbors, of irradiation in variable estrogen receptor (ER) environments provides clearer proof of this association. The researchers irradiated MDA-MB-231 cells, which are ER negative, and MCF-7 cells, which are ER positive; additionally, they treated both cells with 17 -estradiol (E2), which activates ESR1, and tamoxifen, which is an E2 antagonist. MCF-7 cells, which have ESR1, exhibited increased radiosensitivity and bystander response when treated with E2; the effect was diminished by tamoxifen. E2 also increased MCF-7 reactive oxygen species (ROS), absent radiation; however, in MDA-MB-231, neither the bystander response nor ROS increase was observed (Shao et al. 2008). Given that ESR1 activation sensitized MCF-7 cells to radiation, we consider this direct validation.

High Scoring Analyses: eQTL+M
Observation: Mouse embryo fibroblasts (MEFs) proficient and deficient in PML were irradiated to determine the effect of PML in radiation-induced apoptosis. PML deficient cells were much more resistant to radiation, indicating that PML mediates the apoptosis of radiation therapy.

Overexpression of PML upon cellular irradiation potentiated c-Jun transcriptional activation
and co-activation of c-Jun by PML was observed exclusively in irradiated cells. ChIP experiments of irradiated and unaffected cells showed that binding of c-Jun to its promoter was observed in irradiated, but not unaffected cells. Super shift analysis also showed that the DNA binding ability of c-Jun/ATF-2 was comprised in PML deficient irradiated cells (Salomoni et al. 2005). A separate examination of the interaction between PML and TOPBP1 demonstrated that both co-localize in the nucleus of the cervical cell lines, SiHa, to repair DNA damage after irradiation. Additionally, siRNA PML knockouts exhibited a decrease in radiation-induced TOPBP1 expression, suggesting PML is a regulator of TOPBP1.
However, overexpression of PML did not increase mRNA levels of TOPBP1, but did increase TOPBP1 protein expression. Furthermore, pulse-chase labeling experiments indicated that PML increases the half-life of TOPBP1 protein, indicating that this regulation occurs at the post-transcriptional level (Xu et al. 2003). A separate analysis in HeLa cells treated by radiation and cisplatin showed upregulation of PML protein in response to treatment, although northern blotting did not indicate a gross increase in mRNA levels. Transfection of p53 into HeLa upregulation of PML with respect to control, indicating that PML is regulated by the p53 pathway (Chan et al. 1997).

Validation Status: Direct
Evidence: Differential Concentration of TF and TF siRNA LLR: 3.02

High Scoring Analyses: eQTL+M, eQTM
Observation: In non-small cell lung cancer (nsCL) BE1 cells, HDAC1 and HDAC2 expression were highly correlated and significantly higher than normal tissues. Prognosis of patients with low expression of these HDACs was noticeably higher compared to high expression patients.

Irradiation of BE1 cells downregulated HDAC1 and HDAC2 expression and upregulated AXIN
expression. Knockdowns of HDAC1 and HDAC2 via siRNA upregulated AXIN expression as well. Radiation treatment combined with HDAC knockdown in BE1 cells expressing AXIN increased apoptosis of cells compared to radiation treatment alone (Han et al. 2012). In fact, there is a growing body of literature that suggest combination therapies of HDAC inhibitors with radiation in chemotherapy treatment (New et al. 2012).

Validation Status: Direct
Evidence: TF ChIP and TF siRNA LLR: 5.78

High Scoring Analyses: eQTL+M, eQTM
Observation: There is a close relationship between the mTOR pathway, which rapamycin directly targets, and the PI3K share a lot of crosstalk, leading to the development of dual inhibitors (Zaytseva et al. 2012). In paper examining PI3K inhibition in regards to differentiation in erythroleukaemia cells, the researchers noticed that treatment with the mTOR inhibitor rapamycin dramatically reduced GATA1 binding to DNA, which is essential in erythroid differentiation (Bavelloni et al. 2000). Another investigation into the role of epidermal growth factor (EGF) on upregulation of the excision-repair cross-complementary 1 (ERCC1) gene in human hepatocarcinoma cells (HuH7) revealed that in EGF-induced HuH7 cells, PI3K inhibition combined with silencing of the PI3K pathway kinase FKBP12-rapamycinassociated protein or mammalian target of rapamycin (FRAP/mTOR) upregulated ERCC1.
Additionally, motif search identified a binding site for GATA1 in the promoter of ERCC1 with ChIP confirming the binding of GATA1 to the promoter in EGF-induced cells (Andrieux et al. 2007). This indicates that GATA1 plays a role in mTOR/PI3K signaling, as it is a possible regulator of a target of such a pathway. This association was also corroborated by our own siRNA knockdown of GATA1 in MDA-MB-231 cells treated with rapamycin; cells were more resistant to rapamycin-induced apoptosis when GATA1 was knocked down.

High Scoring Analyses: eQTL+M
Observation: An experiment treating lung adenocarcinoma cells (A549) with rapamycin identified STAT1 interacting with mTOR and increased STAT1 nucelar concentration after rapamycin treatment (Fielhaber et al. 2009). While STAT1 is not STAT2, the two form heterodimers and are members of the same protein family. Additionally, the pGENMI STAT1 association with Rapamycin score was 2.49, which though below threshold, is very similar to the STAT2 score. We believe that STAT2 is interpreting the same signal in this context and thus the association is valid. This association was also validated in our own experimental validations of the effect of STAT2 knockdown on rapamycin treated MDA-MB-231 cells.

Validation Status: Direct
Evidence: TF Over Expression and TF siRNA LLR: 3.08

High Scoring Analyses: eQTL+M
Observation: A study showed that PHF8 mRNA and protein levels were downregulated in failing human and mice hearts undergoing hypertrophy. Overexpression of PHF8 identified the aKT/mTOR pathway as a target of PHF8. When this pathway was inhibited by treatment of rapamycin, the phenotype lost by PHF8 deficiency was rescued (Liu et al. 2015). As in the previous two associations with rapamycin, we corroborated this association in the laboratory.  (Kreisler et al. 2010). Though we lack direct evidence, given REST's regulation of AKT2 and triciribine's role as an AKTinhibitor, we are confident in confirming this association at least partially.  (Kim et al. 2010). Another study elucidated that AKT-mediated phosphorylation is crucial for repression of NANOG in differentiating murine embryonal carcinoma cells (Chen et al. 2013b). Taken together, we believe there exists a plausible connection between NANOG and the AKT inhibitor triciribine.   (Weisenberger 2014). IDH1 mutations have been shown to predict longer survival times via treatment of temozolomide (Houillier et al. 2010). It has also been shown that EBF1 can bind to both DNA and TET2 and function as a demethylation agent in IDH1 mutants (Guilhamon et al. 2013). Thus, EBF1 may have a role to play in differential methylation of the MGMT promoter in IDH1 mutants, and thus have a role to play in cellular response to temozolomide.

pGENMi produces distinct associations than simple baseline methods
The following results extend the section of the same name in the main text. We investigated the extent to which sub-significant results in the baseline were enriched with pGENMi. Relaxing the 0.10 FDR threshold to 0.25 produced 253 associations rather than 121, though the overlap with the 90 pGENMi associations only increased to 6 (Hypergeometric p-value = 0.63). We further diluted the statistical reliability of the TWAS associations by reporting associations based on uncorrected p-values <= 0.05. Of the 384 associations matching this criteria (among the 3552 test), the overlap with pGENMi yielded only 10 associations (Hypergeometric p-value = 0.51).

Cell line specific ChIP data and Composite ChIP data alter pGENMi associations
As the genotype, methylation, gene expression, and cytotoxicity data all were derived from the same panel of cell lines, we inquired whether restricting ChIP peaks to those from lymphoblastoid cell lines produced different results than when using a union of clustered peaks across cell lines ('composite ChIP' data), which was the strategy used for the analyses above. To test this, we ran pGENMi using eQTL data co-incident with GM12878 ChIP peaks from ENCODE and compared the results to the eQTL-only analysis using composite ChIP peaks. Among the 37 GM12878 specific TFs and 24 drugs tested, we tested 888 (TF, Drug) associations, of which 90 pairs exhibited an LLR >= 1.74. At the same threshold, and when testing the same candidate pairs, the composite ChIP analysis produced 29 associations, with only three associations -(ELF1, 6-MP), (MAX, 6-MP), and (NFIC, 6-TG) -being common with the analysis based on GM12878 peaks.
We noted that pairs with LLR >= 1 in composite ChIP analysis had a greater tendency (compared with those with LLR < 1) to be in the significant list of the GM12878 ChIP analysis (t-test p-value 3.05E-04, see Supplemental Fig. S14). Overall, our conclusion from this comparison is that the top associations reported by pGENMi depend significantly on the source of the ChIP data use to infer regulatory evidence, which is expected. Our use of composite (multiple cell line) ChIP data in the results reported in this work was motivated by our search for associations that generalize beyond lymphoblastoid cell lines, to other cancer cell lines where many of the associations validated in the literature as well in this study have been demonstrated.
It is notable that the GM12878 specific analyses produced a substantially greater number of associations above the reporting threshold. One reason for this may lie in the difference between the GM12878 and Composite ChIP analyses in terms of the number of target genes (those with regulatory evidence) for each TF. For instance, the median number of target genes with cis-eQTL evidence for a TF's influence is 58 per TF and 151 per TF for Composite and GM12878 ChIP data respectively; the mean number of target genes per TF based on either cis-eQTL or cis-eQTM (eQTL+M) evidence improves to 102 for the composite ChIP data and 155 for the GM12878 data.
The detection discrepancy between the two analyses that leads to fewer targets in the Composite ChIP analysis may be due, in part, to the clustering across cell lines which dilutes LCL specific ChIP signals and high occupancy target (HOT) region removal. It does not seem exceptional that LCL derived eQTLs would be more enriched in LCL specific ChIP peaks. Another reason for the GM12878 specific analyses exhibiting may be the asymmetric distribution of TFs associated with drugs in the analysis using GM12878 ChIP data. As shown in Supplemental Table S4, the 4 drugs 6-MP, Doxorubicin, Epirubicin, 6-TG account for 67 of the 90 (TF, Drug) associations observed at a LLR >= 1.74. Two of these drugs -6-MP and 6-TG -also had among the largest number of TWAS genes, which together with the larger numbers of target genes designated with GM12878 ChIP data may have resulted in the inflated numbers of TF associations. The enrichment of associations for these drugs may also indicate a greater sensitivity to them in this cell line.
We maintain that the Composite ChIP analysis offers a more balanced and generalizable view of the regulatory landscape for a TF and thus is more robust to validation in other cell lines and tissues. We leave more in depth analysis of the utility of cell line specific ChIP data for future research.

Potential factors leading to false positives in pGENMi predictions
Validation rates of pGENMi predictions, based on literature search or our own experiments, were around 40% overall, which is far from perfect, and highly variable across drugs. One of the potential reasons for false positives is the great variation among drugs in terms of the number of significant TWAS genes, i.e., genes with significant correlation between their expression and cytotoxicity. We tabulated the number of significant TWAS genes alongside the number of pGENMi eQTL+M associations (LLR >= 3) for each drug in Supplemental Table S6. The five drugs with most reported associations (6-MP, NAPQI, Temozolomide, Cisplatin, Ara-C) -about 10 TFs on average -show far more TWAS genes (median of 227), compared to the remaining 19 drugs, which typically yield about 1 TF association (median) and have only 27 TWAS genes (median); see Supplemental Table S7. This raises the possibility that pGENMi makes false positive predictions in cases when there are significantly more phenotype-associated genes.
Pursuing this further, we noted that for the top five drugs by number of associations, our literaturebased validation ( Table 2) found direct evidence for only 10 of 35 (29%) predicted TF associations, while for the remaining 19 drugs we were able to find similar direct evidence for 20 of 38 (53%) TF associations predicted by pGENMi, a statistically significant difference (Hypergeometric test p-value 0.032). Similar observations were made on our own experimental validations (Table 4A), though small sample sizes prevent statistical claims. For instance, two of the above-mentioned 'top 5' drugs had 3 or more TFs tested by us (6-MP and Temozolomide), and both yielded low validation rates (0/3 and 1/4 respectively), while the drug Rapamycin, for which only 3 predictions were made and all three were tested by us, yielded a success rate of 3/3. In light of the above observations, we believe that the accuracy of pGENMi predictions may suffer when a phenotype variation is correlated with a large number of genes' expression levels, and future work should attempt to rectify this issue if possible.
Another possible factor leading to false positives is that extensive co-binding of a pair of TFs leads to the method mistaking the co-bound TF for the true regulator. We have tried to address this potential source of false positives by removing HOT regions where most co-binding is observed from our analysis (see Supplemental Note S1). Many TF pairs are known to exhibit significant co-binding even after HOT region removal, as demonstrated by the ENCODE project, and thus it is possible that some of the false positives arise from co-binding. In such cases, if the true TF is among those tested by us (i.e., has ENCODE ChIP data), then we would expect that it should also score highly in the pGENMi analysis. For example, in our experimental validation, we predicted that ELF1 and ZNF263 were both associated with temozolomide, but only succeeded in validating ELF1 through cytotoxicity assays, while ZNF263 knockdown did not show significant change in response. ENCODE reports that these two TFs co-bind across the whole genome in K562 cells (The Encode Project Consortium 2012), so it is possible that the predicted ZNF263 association was a false positive because of the true ELF1 association and the ELF1-ZNF263 cobinding.

pGENMi comparison to GPA
The Genomic Pleiotropy with Annotation (GPA) (Chung et al. 2014) algorithm is similar in spirit to pGENMi. GPA uses latent variables for SNPs and integrates the GWAS p-value of a SNP with annotations of that SNP; pGENMi in contrast uses latent variables for genes and integrates the TWAS p-value a gene with annotations of that gene based on eQTL, eQTM and TF ChIP peak evidence. The commonality between the two models, is thus at a higher conceptual level rather than at a practical level that might warrant empirical comparisons. Even at the technical level there is a key difference in that GPA models the joint likelihood of the annotations and GWAS pvalue of SNPs, whereas we only use regulatory evidence to maximize the likelihood of TWAS data.

Removing HOT Regions from ENCODE TF ChIP data
We used TF ChIP peaks as a way to focus on SNPs whose association with expression might be mediated by the regulatory action of that TF. However, it is well known that different TFs tend to co-localize at the same genomic loci, a phenomenon that is especially pronounced at "HOT" (high occupancy target) regions, and a TF's binding at these HOT regions is not necessarily indicative of regulatory function (mod Encode Consortium et al. 2010). To enrich the ChIP-based collection of binding sites for functional TF-DNA interactions, we removed segments of 50bp where six or more TFs bind, resulting in a ~25% reduction in the total number of ChIP TF peaks across all ENCODE cell lines.

SNP Imputation
Imputation was run separately for each race and chromosome separately. Chromosomes were divided into 40MB regions. BEAGLE v3.3.1 (Browning and Yu 2009) was run on these 40MB regions of the genome, plus a 1MB buffer region to the right and left of the main region, as the ends are generally imputed poorly. The reference and observed genotype data were input as phased and unphased, respectively. The lowmem option was enforced to reduce the overall amount of memory required to run BEAGLE; additionally, the exclude markers option was used to remove the rare SNPs from the reference mentioned above. For edification, BEAGLE imputed untyped markers, not missing genotyped markers.

Summarizing probe expression by gene symbol
Rather than modeling raw probe data, we opted to summarize probe level expression at the Ensembl gene symbol level, since representing gene expression using multiple correlated probes breaks the independence assumptions of our probabilistic model by giving greater weight to genes with higher probe coverage. Of the 54,613 probes, those with low variance c. Return z-score normalization of the projected PC1 expression.
Applying this procedure resulted in a matrix of Ensembl gene expression with dimensionality 16,183 (number of genes) x 284 (number of LCLs), and Z-score normalized expression across individuals. Henceforth in this publication, we refer to this matrix and its values as the gene expression matrix and gene expression, respectively.

Controlling for confounding variables
We included the following potentially confounding variables as covariates in all regression analyses for this study: batch, age, gender, and sub-population labels, derived from EIGENSTRAT (see below). We decided to omit explicit labels of ethnicity, as the sub-population labels should capture that information implicitly. To derive p-values for the covariate of interest in the multiple regression (for instance, a SNP in an eQTL regression) we computed the p-value of the log likelihood ratio between models with and without the covariate of interest, using a 2 distribution.

Population stratification
Despite the information provided by ethnic labels, it is well known that sub-population structures within these ethnic groups contain important information that can radically confound the results of association analyses when ignored or improperly controlled (Price et al. 2006). To address this issue, we utilized the EIGENSTRAT program (version 6.0.1) developed by the Price lab (https://www.hsph.harvard.edu/alkes-price/software), to identify sub-population labels from genotype data and remove their potential confounding effects in regression analyses.
Instrumental to EIGENSTRAT's ability to recover true population labels is the quality and format in which the data are provided. It may conflate large genomic regions in linkage disequilibrium (LD) with population structure, hence such regions have to be pruned. Additionally, while EIGENSTRAT could identify latent population labels corresponding to the cohort's ethnicities from the entire cohort's genotype matrix, it is more useful run EIGENSTRAT on each known ethnic group's genotype matrix: doing so will direct EIGENSTRAT to look for sub-populations only within each ethnicity, and not across.
To perform this analysis, we first computed a set of independent SNPs using the PLINK program (1.90 beta) (Chang et al. 2015). For each ethnicity (HCA, CA, and AA), we used a variant pruning algorithm to remove redundant SNPs in LD and reported only independent SNPs on somatic chromosomes. To generate these results, we ran PLINK with the '-indep' flag and the following parameters: Using these parameters, PLINK slides a window across the genome and only retains those SNPs in the window that cannot be adequately predicted from a linear combination of the remaining SNPs. For each SNP in a window, PLINK reports a goodness of fit, R 2 s, of the multivariate regression to predict from all other SNPs. PLINK retains only those SNPs with R 2 >= 1 -1/VIF. Setting VIF to 1 ensures that only statistically independent SNPs in each window are selected; although strict, this removes all potential for LD confoundment, at the potential cost of underestimating the number of sub-populations.
We estimated these sub-population labels as continuous axes of variation by using EIGENSTRAT to perform PCA on each ethnic group's independent SNP genotype matrix. Each significant principle component (PC) derived from the genotype matrix of a given ethnicity can be interpreted as a sub-population of that ethnicity. For each ethnicity, we retrieved all PCs with Tracy-Widom p-values ≤ 0.05, resulting in a total of seven axes, shown in Supplemental Table S1. We projected the entire genotype matrix onto each of the seven axes (setting to 0 SNPs not contributing to the axis) to derive numeric sub-population representations for the cohort. We included these sub-population labels as covariates in our regression models to control for population stratification.
Supplemental Note S5: Experimental validation design, data, methodology, and statistical analysis

Removing Common Transcription Factors for Experimental Validation
In experimental validation, we avoided TFs like BRF2 and GTF2F1 that were associated with 10 or more drugs. These are general transcription factors whose association is likely due to having many more ChIP peaks than other TFs, a point we have discussed in previous work (Hanson et al. 2015). We also excluded NAPQI, a toxic byproduct produced during the xenobiotic metabolism of the analgesic paracetamol, due to its lack of clinical application. All 13 drugs were dissolved in dimethyl sulfoxide (DMSO) and aliquots of stock solutions were frozen at -80°C.

RNA interference and Real-time quantitative reverse transcription-PCR (qRT-PCR)
siRNAs for candidate TFs and negative control siRNA were purchased from Dharmacon.
Reverse transfection was performed for MDA-MB231 and U251 cells in 96-well plates.
Specifically, 3000-4000 cells were mixed with 0.1 mL of lipofectamine RNAi-MAX reagent (Invitrogen) and 10 nM siRNA for each experiment. Electroporation was performed for Jurkat cells using Nucleofector® Kit V from Lonza (Cologne, Germany).
Prior to electroporation, cells were washed with phosphate buffer saline (PBS) and counted.
One million Jurkat cells were re-suspended in 100 µL of the Nucleofector® Solution buffer and mixed with 100 nM of specific siRNA. The re-suspended cells were transferred to cuvettes and immediately electroporated using the program X-005. After electroporation, cells were incubated in a cuvette at room temperature for 10 minutes and then 500 µL of pre-warmed culture medium were added to the cuvette. Cells were then transferred to a 12-well plate and incubated at 37°C/5% CO 2 overnight.
Total RNA was isolated from cultured cells transfected with control or TF-specific siRNAs with the Qiagen RNeasy kit (QIAGEN, Inc.), followed by qRT-PCR performed with the one-step Brilliant SYBR Green qRT-PCR master mix kit (Stratagene). Specifically, primers purchased from QIAGEN were used to perform qRT-PCR using the Stratagene Mx3005P Real-Time PCR detection system (Stratagene). All experiments were performed in triplicate with beta-actin as an internal control. Reverse transcribed Universal Human reference RNA (Stratagene) was used to generate a standard curve. Control reactions lacked RNA template.

MTS cytotoxicity assay
Cell proliferation assays were performed in triplicates at each drug concentration. Cytotoxicity assays with the lymphoblastoid and tumor cell lines were performed in triplicates at each dose.
Specifically, 90 μL of cells (5 × 10 4 cells) were plated into 96-well plates (Corning, NY) and were treated with increasing does of a specific drug or radiation. After incubation for 72 hours, 20 μL of CellTiter 96® AQueous Non-Radioactive Cell Proliferation Assay solution (Promega Corporation, Madison, WI) was added to each well. Plates were read in a Safire2 plate reader (Tecan AG, Switzerland).
Cytotoxicity assays with the tumor cell lines were performed using the CellTiter 96® AQueous Non-Radioactive Cell Proliferation Assay (Promega Corporation, Madison, WI). Specifically, 90 μL of cells (5 × 10 3 cells) were plated into 96-well plates and were treated with increasing does of a specific drug. The escalation of concentrations for each drug is listed in Supplemental Plates were read in a Safire2 plate reader (Tecan AG, Switzerland). Cytotoxicity was assessed by plotting cell survival versus drug concentration, on a log scale.
Radiation cytotoxicity was performed in triplicates at each radiation dose as described above.

Statistical analysis
Significance of the IC50 values between negative control siRNA and TF-specific siRNA was determined using a two-tailed unpaired t-test.

Likelihood
The probability that = 1 is a logistic function of and : The probability that = 0 is 1 -probability = 1: If you interpret 1 as exp(−0 ⋅ ), then this is simply the Boltzman distribution with the 2 nd state being = .
The probability of depends on if is 1 or 0. With respect to the former, it is distributed per a beta distribution (with = 1) and uniform distribution for the latter.

Update :
The parameter has a closed form solution. Given Optimizing for gives: Setting to 0 and solving for gives the following:

Log Likelihood Ratio:
We compare the fit model, = 1, to a model where = . We indicate this with, = 0, indicating the null hypothesis. We assess the significance of the alternative model, = 1, by computing the log likelihood ratio between a fit model and the null model. = log( ( | , , , = 1)) − log( ( | , , , = 0)) Implementation: In our implementation of EM, we choose = 0.0001 and = 0.0001 with a minimum and maximum of 25 and 300 steps in each EM calculation and 100 random restarts for each model.
For each random restart, we estimated an alternative and null model. We report the LLR corresponding to the model with the best log-likelihood for the alternative model.

TensorFlow Implementation:
We also have a TensorFlow implementation available for download, although this implementation was not used to generate the results of this paper.

Code Location:
Both the C++ code used to generate the results of this paper and the TensorFlow code are available in a GitHub repository located at https://github.com/knoweng/pgenmi and with a link to the GitHub repository at veda.cs.uiuc.edu/pgenmi . The GitHub repo is also available in Supplemental Code S1. Supplemental Fig. S1: Cytotoxicity experiments performed in Jurkat cell lines treated with 6-MP.
Of the 5 TFs tested, one was validated. In this case, the validated TF, EZH2 (outlined in red), and FOXP2 were eQTL+M negative controls, while the other 3 were eQTL+M predictions. . Of the 2 TFs tested, CEBPD, which was an eQTL+M prediction (as indicated by the red outline), was validated, while EZH2, which was a negative control, failed to be validated. Fig. S11: Cytotoxicity experiments performed in MDA-MB-231 cell lines treated with rapamycin. All 3 TFs were validated, as indicated by the red outline, and all were eQTL+M predictions.

Supplemental
Supplemental Fig. S13: Cytotoxicity experiments performed in U251 cell lines treated with temozolomide (TMZ). Of the 4 TFs tested, only 1 was validated, as indicated by the red outline, and all were eQTL+M predictions, except ZNF263 which was only supported by eQTM analysis.