FishComparativeAtlas snakemake workflow archive: dataset and pipeline freeze at the time of publication
This archive allows both to reproduce the generation of the fish comparative atlas and/or to directly inspect the result.
Pipeline
FishComparativeAtlas is a snakemake pipeline to trace the evolution of sister duplicated chromosomes derived from whole genome duplication in teleost genomes. The snakemake workflow is defined in the file Snakefile and calls python scripts stored in src/. This archive contains the version v1.0.0 of the FishComparativeAtlas code, as used to generate the 74 teleost genomes comparative atlas.
The conda environment to run the FishComparativeAtlas pipeline is provided in envs/fish_atlas.yaml.
Input data
The paths to all input data are stored in the snakemake configuration file config_altas74_fish.yaml.
Main inputs
The main inputs to the FishComparativeAtlas pipeline are:
-
ancestral chromosomes (pre-TGD) mapped on 4 teleost genomes (taken from Nakatani and McLysaght 2017), stored in
data/MacrosyntenyTGD/, -
the SCORPiOs-corrected gene trees with genes of the 74 teleosts and 33 outgroups, in
data/atlas_74fish/SCORPiOs_corrected_forest_5_complete_tags.nhx, -
the species tree used to build and reconcile the gene trees, in
data/atlas_74fish/species_tree.nwk. -
the genes coordinates files for all 74 teleosts, in
data/atlas_74fish/genes/.
Additional inputs
Additional inputs in the data/atlas_74fish/ folder include:
-
data/atlas_74fish/rename_chr.txt, a conversion table to rename pre-TGD ancestral chromosomes indata/MacrosyntenyTGD/, to make them consistent with previous ancestral chromosome names published in (Nakatani and McLysaght 2017). -
data/atlas_74fish/assembly_conversion/, files to convert genes and coordinates from ensembl75 (data in Nakatani and McLysaght 2017) to ensembl95 (data in our gene trees). -
data/atlas_74fish/CompAtlas_stats_trees_noSCORPiOs.txt, annotation statistics from a previous run of the FishComparativeAtlas pipeline on phylogenetic gene trees built with TreeBeSt but without SCORPiOs correction of WGD duplication nodes.
Output
The generated comparative atlas is stored in output/comparative_atlas.tsv. It is a tab-delimited file with 3 columns: the unique identifier of the post-duplication gene family, all teleost genes in the family and the predicted post-duplication ancestral chromosome (1a, 1b, 2a...).
Gene names can be crossed with the genes coordinates files (data/atlas_74fish/genes/) to obtain the genes to species correspondance.
Reproducing the output
-
Create and activate the conda environment (alternatively you can manually install the dependencies listed in
config_atlas74_fish.yaml):conda install mambamamba env create -f envs/fish_atlas.yamlconda activate fish_atlas -
Run FishComparativeAtlas (~ 5 minutes):
snakemake --configfile config_atlas74_fish.yaml --cores 4
The output file out_atlas74_fish/comparative_atlas.tsv will be generated, along with figures with genomic annotations and statistics in out_atlas74_fish/figures.
References
FishComparativeAtlas takes as input the pre-TGD ancestral chromosomes predictions from:
- Nakatani and McLysaght 2017: Nakatani Y, McLysaght A. 2017. Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes. Bioinformatics 33:i369–i378.