# Convergent transcriptomic and genomic adaptation in arid rodents

This repositery was created to ensure reproducibility of the results and figures published in the article entitled 
"Degrees of convergent evolution in rodent adaptations to arid environments".


## Datasets preparation

Preparation of the data includes retrieval of the environmental values and preprocessing of :
1) the RNAseq files for transcriptomic analyses and 2) preparation of sequences for detection of amino acid changes.

### Acquisition of environmental data and rodent phylogeography data

**Pluvio_phylogeography**

  - FabreRodentPhylogeography.Rmd : get species observations from gbif database (https://www.gbif.org/fr/) and environmental data from worldclim database (http://www.worldclim.org/)
  - Co-Phylo_and_AncestralStates.R : Recontruct ancestral environmental states on the tree from Fabre et al. and plot the tree with the species used in paper (Fig. S1 and S2, Supplemental Data S3 and S4)


### RNA and cDNA preprocessing

An automated pipeline called "Ropipe" using nexflow has been used to do:
 
  - relevant public data identification
  - *de novo* assemblies (from public and new data)
  - gene annotation
  - gene quantification
  - gene family building and alignment
  - species tree

Output data are used for: 1- expression, co-expression and deconvolution analyses and 2- detection of changes in AA sequences

## Expression analyses

Expression analyses part includes the following sections:
All are R scripts.

    - Preparation and annotation of the count tables with all individuals and all species
    - PCA (control), and PCA with batch correction
    - DE analyses with DESeq2 (all individuals), independant small pairs differential expression control/analyses
    - DE analyses using EVEmodel
    - Test the significance of the number of differentially expressed genes  
    - Co-expression analyses (WGCNA), trait correlations and heatmap visualization
    - Enrichment analyses (GO) and visualization (Network with igraph and ggraph)
    - Deconvolution analyses using MuSiC and rat kidney scRNA-seq reference
    - Species pairwise analyses


## Coding Sequence analyses

**Sequences_Pelican** 

Coding sequence analyses correspond to the search of changes in sequences.
It uses output from ropipe pipeline 3 (alignments aa, nt and trees)

This section includes runs to search for changes using Pelican and analyse the data.






    



