FishComparativeAtlas snakemake workflow archive: dataset and pipeline freeze at the time of publication

This archive allows both to reproduce the generation of the fish comparative atlas and/or to directly inspect the result.

Pipeline

FishComparativeAtlas is a snakemake pipeline to trace the evolution of sister duplicated chromosomes derived from whole genome duplication in teleost genomes. The snakemake workflow is defined in the file Snakefile and calls python scripts stored in src/. This archive contains the version v1.0.0 of the FishComparativeAtlas code, as used to generate the 74 teleost genomes comparative atlas.

The conda environment to run the FishComparativeAtlas pipeline is provided in envs/fish_atlas.yaml.

Input data

The paths to all input data are stored in the snakemake configuration file config_altas74_fish.yaml.

Main inputs

The main inputs to the FishComparativeAtlas pipeline are:

Additional inputs

Additional inputs in the data/atlas_74fish/ folder include:

Output

The generated comparative atlas is stored in output/comparative_atlas.tsv. It is a tab-delimited file with 3 columns: the unique identifier of the post-duplication gene family, all teleost genes in the family and the predicted post-duplication ancestral chromosome (1a, 1b, 2a...).

Gene names can be crossed with the genes coordinates files (data/atlas_74fish/genes/) to obtain the genes to species correspondance.

Reproducing the output

The output file out_atlas74_fish/comparative_atlas.tsv will be generated, along with figures with genomic annotations and statistics in out_atlas74_fish/figures.

References

FishComparativeAtlas takes as input the pre-TGD ancestral chromosomes predictions from: