Supplemental_Code_2 – R code used to calculate enrichment scores (ES) #This will return probability of it being below the first component number. To get p-value use # 1-phyper(-1, # # # ) #The -1 in the first component is to make the phyper be the likelihood of the white balls being less than the number, with the 1- it gives the probablility of it being the number or higher #function to perform the phyper test #dataset and enriched_in should be in the *same order* and *restricted to SNPs that are shared between both* hyperGeo_RASQUAL<-function(dataset, enriched_in){ #Count the number of white balls drawn out of the urn. These are the number of successes that are in the dataset AND STARR-seq sig white_drawn<-as.numeric(table(dataset$Q.value<0.05 & enriched_in$adj.P.Val<0.05)['TRUE']) #Count the number of white balls in the urn. This is the number of successes in dataset that were in the STARR-seq library white<-as.numeric(table(dataset$Q.value<0.05)['TRUE']) #Count the number of black balls. These are the number of snps in the library that are not successes in the dataset black<-as.numeric(table(dataset$Q.value<0.05)['FALSE']) #How many samples are being taken. AKA the number of significant STARR-seq SNPs (since this is what we want to see enrichment in) n_pick<-as.numeric(table(enriched_in$adj.P.Val<0.05)['TRUE']) #Calculating the probability of picking the number of white drawn or higher return(c((1-phyper(white_drawn-1, white, black, n_pick)), #Gives hypergeometric pvalue (white_drawn/((white/(white+black))*n_pick)), #Gives enrichment Score (ES) white)) #Gives the n of the dataset }