# NMRDMR Snakemake Workflow : Cross-Species Regulatory Elements from Histone Mark Peaks

[![Snakemake](https://img.shields.io/badge/snakemake-≥5.13-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Snakemake-Report](https://img.shields.io/badge/snakemake-report-green.svg)](report.html)

The pipeline starts from **H3K4me3, H3K4me1 & H3K27ac peaks** in different species, and outputs **promoters, enhancers and primed enhancers sets** for each species, all mapped to a **common coordinate system** using a reference (mouse). **H3k27ac reads density** for the sets of **orthologous promoters and enhancers** are then extracted from .bam files and **normalized** across species and replicates.

This workflow was developed to define orthologous regulatory elements in the following study:

[Parey et al., 2023, Phylogenetic modeling of enhancer shifts in African mole-rats reveals regulatory changes associated with tissue-specific traits](https://doi.org/10.1101/2023.01.10.523217).

## Reproducing the analysis

### Installation (requires conda)

To install snakemake in a conda environnment (for example in an env named `snake`), run the following commands:

- `conda install -c conda-forge mamba`

- `mamba create -c conda-forge -c bioconda -n snake snakemake==6.9`

### Running

- Activate the ennvironnment
    
    `conda activate snake`

- Run the pipeline to reproduce the results presented in the manuscript (launch the command from within the `nmrdmr_pipeline/` folder, note that this will automatically download the associated .bam files):

    `snakemake --configfile config_nmrdmr_final.yaml --cores=10 --use-conda`

- Generate a report with key plots and statistics

    `snakemake --configfile config_nmrdmr_final.yaml --report report_nmrdmr.html`

### Output files

The pipeline generates a number of intermediary results files. The most important outputs are:

- .bed files of orthologous regulatory elements in each species coordinates (`out_nmrdmr/mappable_regulatory_elements/${regulatory_element}/${tissue}/${species}.strict.ok.bed`, for instance for mouse heart promoters: `out_nmrdmr/mappable_regulatory_elements/Promoters/Heart/Mus_musculus.strict.ok.bed`)

- .csv files with normalized H3K27ac reads density, which can be used for phylogenetic modeling (`out_nmrdmr/coverage/tables_for_eve/qnorm_for_eve/`)

## Going further

More details are provided in the `documentation/` folder, including a description of the pipeline steps and how to run it on different datasets (not extensively tested).   

