NMRDMR Snakemake Workflow : Cross-Species Regulatory Elements from Histone Mark Peaks
The pipeline starts from H3K4me3, H3K4me1 & H3K27ac peaks in different species, and outputs promoters, enhancers and primed enhancers sets for each species, all mapped to a common coordinate system using a reference (mouse). H3k27ac reads density for the sets of orthologous promoters and enhancers are then extracted from .bam files and normalized across species and replicates.
This workflow was developed to define orthologous regulatory elements in the following study:
Reproducing the analysis
Installation (requires conda)
To install snakemake in a conda environnment (for example in an env named snake), run the following commands:
-
conda install -c conda-forge mamba -
mamba create -c conda-forge -c bioconda -n snake snakemake==6.9
Running
-
Activate the ennvironnment
conda activate snake -
Run the pipeline to reproduce the results presented in the manuscript (launch the command from within the
nmrdmr_pipeline/folder, note that this will automatically download the associated .bam files):snakemake --configfile config_nmrdmr_final.yaml --cores=10 --use-conda -
Generate a report with key plots and statistics
snakemake --configfile config_nmrdmr_final.yaml --report report_nmrdmr.html
Output files
The pipeline generates a number of intermediary results files. The most important outputs are:
-
.bed files of orthologous regulatory elements in each species coordinates (
out_nmrdmr/mappable_regulatory_elements/${regulatory_element}/${tissue}/${species}.strict.ok.bed, for instance for mouse heart promoters:out_nmrdmr/mappable_regulatory_elements/Promoters/Heart/Mus_musculus.strict.ok.bed) -
.csv files with normalized H3K27ac reads density, which can be used for phylogenetic modeling (
out_nmrdmr/coverage/tables_for_eve/qnorm_for_eve/)
Going further
More details are provided in the documentation/ folder, including a description of the pipeline steps and how to run it on different datasets (not extensively tested).