NMRDMR Snakemake Workflow : Cross-Species Regulatory Elements from Histone Mark Peaks

Snakemake Snakemake-Report

The pipeline starts from H3K4me3, H3K4me1 & H3K27ac peaks in different species, and outputs promoters, enhancers and primed enhancers sets for each species, all mapped to a common coordinate system using a reference (mouse). H3k27ac reads density for the sets of orthologous promoters and enhancers are then extracted from .bam files and normalized across species and replicates.

Table of content

Description

The workflow image is provided in nmrdmr_pipeline_dag.pdf. The pipeline takes as input (i) a samplesheet with samples information (with a rigid format, see the data/NMRDMR_DatasetSummary_Villar_210816.txt samplesheet as an example, including infos related to samples: species, mark, tissue, peak file name...), (ii) corresponding peaks and (iii) bam files (or a file with urls to automatically download the bams, see data/samples.tsv) and (iv) one tss .bed file per species (for plots, should match the pattern 'TSS.biomart.{species_name}.bed'). Note that peak files should have the .narrowPeak extension to be recognized by the pipeline.

Briefly, the code consists of 6 modules:

The last module module_quality_control_plots.smk (purple bubbles in the workflow image) produces plots at key steps of the pipeline.

Installation

Dependencies

- conda
- snakemake=6.9

Install conda

The Miniconda3 package management system manages all of the pipeline's dependencies, including python packages and other software (bedtools, liftover...).

To install Miniconda3:

Install snakemake

To install snakemake in a conda environnment (for example in an env named snake), run the following commands:

After these, installation is complete, all that will be necessary before running the pipeline is to activate the environnment with the command conda activate snake.

Usage

Configuration

To run the pipeline, first cd to its root folder cd nmrdmr_pipeline. Second, define the paths to the input files and all parameters in the configuration file, using config_nmrdmr_final.yaml as a template.

Using custom chain files for liftover

Custom chain files can be provided (instead of directly downloaded from Ensembl). In this case, their path should be indicated in the configuration file, as follows:

    chain_files: "lastZ/mmus_grcm38.v.{sps}_lastz_net.all.chain"

Running

Versions

Authors