This folder collects all the code used to produce the analysis and results contained in the manuscript ”Integrative analysis of RNA Polymerase II and transcriptional dynamics upon Myc activation” by de Pretis et. al.

########## STRUCTURE OF THE FOLDER

- the main folder contains files “make_Fig*.R” and “make_xls_reports.R” 
- “primary_data_analysis" folder contains the code that runs computationally intense analyses, such as read mapping on exonic and intronic features, the modelling of the kinetic rates of transcription and the modelling of polymerase progression from the raw data
- “data" folder contains all the R data structures created by the source code described above
- “tables" folder contains the supplementary tables and the external expression datasets used for the manuscript
- “figures" folder contains all the raw figures (both main and supplementary)

########## EXECUTION OF THE SOURCE CODE

The main folder contains the R files used to generate the main and the supplementary figures of the manuscript, as well as supplementary tables, from the RData and rds files contained in the “data” folder. The execution of these files depends on the availability of the following R/Bioconductor libraries:

compEpiTools, INSPEcT, org.Mm.eg.db, pheatmap, pROC, TxDb.Mmusculus.UCSC.mm9.knownGene

Once all the dependencies are installed, make_Fig*.R and make_xls_reports.R can be executed maintaining the current structure of the folders, sourcing them from the folder where they are located.
 
The folder "primary_data_analysis" contains the code able to generate RData and rds files contained in the “data” folder from raw files. With raw files we refer to bam files of total-RNA-seq, labeled-RNA-seq and ChIP-seq experiments (available from the GEO repository linked to the manuscript), as well as the ChIP-seq peaks called using the MACS software (refer to manuscript methods). Once the raw files are available on the local machine, the correct location of the data must be reported within the file "primary_data_analysis/file_paths.R”. 

The execution of the files within this folder depends on the availability of the following R/Bioconductor libraries:

compEpiTools, DESeq2, devtools, DEXSeq, GenomicAlignments, GenomicRanges, INSPEcT, org.Mm.eg.db, parallel, pheatmap, pROC, RUVSeq, TxDb.Hsapiens.UCSC.hg19.knownGene, TxDb.Mmusculus.UCSC.mm9.knownGene

Additionally, the code within R files of the folder "primary_data_analysis” depends often on the previous execution of other files of the same folder. In this case, the list of files to be executed is specified at the beginning of the code. For example, the source code “make_RNAseq.R” depends on the previous execution of the code in “file_paths.R”.

 
R and R-packages version used to run the this code is reported below. The INSPEcT version used to model the kinetic rates of transcription is included in this code because it was not available yet in Bioconductor at the moment of the analysis. 

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 7 (wheezy)
 
locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
 
attached base packages:
 [1] compiler  stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     
 
other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [2] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
 [3] GenomicFeatures_1.24.5                 
 [4] RUVSeq_1.6.2                           
 [5] edgeR_3.14.0                           
 [6] limma_3.28.20                          
 [7] EDASeq_2.6.2                           
 [8] ShortRead_1.30.0                       
 [9] pROC_1.9.1                             
[10] org.Hs.eg.db_3.3.0                     
[11] org.Mm.eg.db_3.3.0                     
[12] pheatmap_1.0.8                         
[13] INSPEcT_1.2.2                          
[14] GenomicAlignments_1.8.4                
[15] Rsamtools_1.24.0                       
[16] Biostrings_2.40.2                      
[17] XVector_0.12.1                         
[18] DEXSeq_1.18.4                          
[19] RColorBrewer_1.1-2                     
[20] BiocParallel_1.6.6                     
[21] DESeq2_1.12.4                          
[22] SummarizedExperiment_1.2.3             
[23] devtools_1.12.0                        
[24] deSolve_1.14                           
[25] compEpiTools_1.6.4                     
[26] GenomicRanges_1.24.3                   
[27] GenomeInfoDb_1.8.7                     
[28] topGO_2.24.0                           
[29] SparseM_1.74                           
[30] GO.db_3.3.0                            
[31] AnnotationDbi_1.34.4                   
[32] IRanges_2.6.1                          
[33] S4Vectors_0.10.3                       
[34] Biobase_2.32.0                         
[35] graph_1.50.0                           
[36] BiocGenerics_0.18.0                    
 
loaded via a namespace (and not attached):
 [1] colorspace_1.3-2              hwriter_1.3.2                
 [3] biovizBase_1.20.0             htmlTable_1.9                
 [5] base64enc_0.1-3               dichromat_2.0-0              
 [7] interactiveDisplayBase_1.10.3 splines_3.3.0                
 [9] R.methodsS3_1.7.1             rootSolve_1.7                
[11] DESeq_1.24.0                  geneplotter_1.50.0           
[13] knitr_1.15.1                  Formula_1.2-1                
[15] annotate_1.50.1               cluster_2.0.5                
[17] R.oo_1.21.0                   shiny_1.0.0                  
[19] httr_1.2.1                    backports_1.0.5              
[21] assertthat_0.1                Matrix_1.2-7.1               
[23] lazyeval_0.2.0                acepack_1.4.1                
[25] htmltools_0.3.5               tools_3.3.0                  
[27] gtable_0.2.0                  Rcpp_0.12.9                  
[29] gdata_2.17.0                  preprocessCore_1.34.0        
[31] rtracklayer_1.32.2            stringr_1.2.0                
[33] mime_0.5                      ensembldb_1.4.7              
[35] gtools_3.5.0                  statmod_1.4.29               
[37] XML_3.98-1.5                  AnnotationHub_2.4.2          
[39] MASS_7.3-45                   zlibbioc_1.18.0              
[41] scales_0.4.1                  aroma.light_3.2.0            
[43] BSgenome_1.40.1               VariantAnnotation_1.18.7     
[45] BiocInstaller_1.22.3          memoise_1.0.0                
[47] gridExtra_2.2.1               ggplot2_2.2.1                
[49] biomaRt_2.28.0                rpart_4.1-10                 
[51] latticeExtra_0.6-28           stringi_1.1.2                
[53] RSQLite_1.1-2                 genefilter_1.54.2            
[55] checkmate_1.8.2               caTools_1.17.1               
[57] matrixStats_0.51.0            bitops_1.0-6                 
[59] lattice_0.20-34               htmlwidgets_0.8              
[61] plyr_1.8.4                    magrittr_1.5                 
[63] R6_2.2.0                      gplots_3.0.1                 
[65] Hmisc_4.0-2                   DBI_0.5-1                    
[67] foreign_0.8-67                withr_1.0.2                  
[69] survival_2.40-1               RCurl_1.95-4.8               
[71] nnet_7.3-12                   tibble_1.2                   
[73] KernSmooth_2.23-15            locfit_1.5-9.1               
[75] grid_3.3.0                    data.table_1.10.4            
[77] marray_1.50.0                 digest_0.6.12                
[79] xtable_1.8-2                  httpuv_1.3.3                 
[81] R.utils_2.5.0                 munsell_0.4.3                
[83] methylPipe_1.6.2              Gviz_1.16.5                  
