Phylogenetic modeling of regulatory activity using the EVE model

Setting up the environnment

The dependencies are listed in env.yaml and can be installed with conda (or mamba):

mamba env create -f env.yaml

To install the evemodel R package from github in the created environment:

Running the EVE model on the normalized read density data at orthologous regulatory elements

The script takes as input normalized read density tables from dataset_s2 and the species tree and then tests for regulatory shifts using the 'evemodel' R package. Empirical p-values are computed from null simulations.

Rscript eve_shifts.R --infile ../../dataset_s2/reads_density/Promoters_Heart_fpkm_normalized.csv --species_tree data/nmrdmr_sptree.nwk --branch anc_mr

Here, 'anc_mr' stands for the ancestral mole-rats branch, use --hgla for naked mole-rat or --fdam for Damaraland mole-rat. EVE shifts and p-values are written to eve_twoTheta_test/anc_mr_Promoters_Heart_fpkm_normalized_eve_results.csv, along with additional plots in the same folder (see the script for details).

Estimating false positive and false discovery rates using simulations with different proportions of nulls and shifts

Rscript eve_simulations_to_estimate_fp_rate.R --infile ../../dataset_s2/reads_density/Promoters_Heart_fpkm_normalized.csv --species_tree data/nmrdmr_sptree.nwk --branch anc_mr

Note that the eve_shifts.R script should be run first as this script reuses some of its outputs.

The output is, for each simulation with x% positives, the associated estimated parameters and p-values for 'null' and 'shifted' simulated element, in the file eve_twoTheta_test/Promoters_Heart_fpkm_normalized_shift_beta_xshifted_eve_Sim_all_params_anc_mr.csv. One can then use these tables to compute false and true positive rates.

From the results of these simulations, we selected a threshold of alpha = 0.2 and abs(shift) > 1.5 to filter shifted elments from the EVE results file (eve_twoTheta_test/nmrdmr_Promoters_Heart_fpkm_normalized_eve_results.csv).