### DECoNT Result Reproduction for all results presented in the paper

------------------------------------------------------------------------

Please make sure that your system has anaconda package manager software for easy package handling. To check how you install anaconda use the following link: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation. If you do not want to use anaconda package manager, please refer to the following link: https://github.com/ciceklab/DECoNT#installation and install the same versions given at the requirements sub-header.

------------------------------------------------------------------------

## Configuring the DECoNT environment:

(i) Configuring DECoNT environment on a MAC computer:

From your terminal, run $ conda env create -f DECoNT_mac.yml and run $ conda activate DECoNT_mac

(ii) Configuring DECoNT environment on a Linux based system:

From your terminal, run $ conda env create -f DECoNT_linux.yml  and run $ conda activate DECoNT_linux

------------------------------------------------------------------------

## Before getting started:

All of the scripts given with this file has the same goal. They load the model weights given in the main directory named DECoNT\ Result\ Reproduction. Notice that for each model only one weight file with ".h5" extension is present. The scripts are structured in each directory just to make reproduction process easy and convenient. We structured the result reproduction process so that, scripts load the respective data and the model weights for the result presented in the paper and prints the results. All of the results are generated with pre-trained model weights within the scripts.

Note that, the data files that the testing scripts use are preprocessed data files. To get unprocessed raw data files please refer to raw_data directory. (e.g. WES-based CNV calls, read depth files generated from sambamba, WGS CNV calls). Since raw read depth files are over 100GB, we provide the link to a shared google drive folder. The folder also includes raw WES-CNV calls from the mentioned callers. For .bam files please refer to the publicly available links given within the manuscript.


------------------------------------------------------------------------

# Fig2a
As it can be seen from the manuscript, Fig2a has bar plots for 3 different metrics: (i) DEL Precision; (ii) DUP Precision and (iii) Overall Precision. In order to reproduce all of the values presented in the bar plots, go into the directory Fig2a first. Then get into the WES-based CNV caller directory of interest (e.g. ./Fig2a/XHMM/). Just run the DECoNT_test_XXXX.py (i.e. XXXX for WES-based CNV caller name, it changes to XHMM, CODEX2 or CONIFER) file inside the directory, and the results will be reproduced and printed to the console. The script will load the model weights in the main DECoNT Result Production directory and produce the results.


# Fig2b
In Fig2b, the similar method apply. Just cd into Fig2b directory and run test_freec.py script. The script will load the model weights in the main DECoNT Result Production directory and produce the plot and show it on the screen. In case you are using Linux based system without a GUI, the resulting plot will also be saved in the same directory with 'result.png' name.


# Fig2c
Exact same method applys with other figures. cd into Fig2c directory and run test_freec.py script. Again the script will plot the Fig2c boxplot and display it on the screen. In case you are using Linux based system without a GUI, the resulting plot will also be saved in the same directory with 'result.png' name.

# Fig3
In order to reproduce Fig3, one must conduct 18 different experiments, however it is made easy by the helper scripts inside the Fig3 directory. The directory contains test folds for each WES-based CNV caller (e.g. ./xhmm_test_fold/). Also 3 scripts are provided for each WES-based CNV caller (i.e. XHMM, CODEX2, CoNIFER) named DECoNT_test_polish_XXX_calls.py. For example, if you want to polish calls made by XHMM, just run DECoNT_test_polish_XHMM_calls.py script with polishing_model variable equal to one of the following 'xhmm' or 'conifer' or 'codex2'. The polishing_model selection specifies which DECoNT weights to use when polishing XHMM calls. In general, if you want to polish calls made by XXX caller with the DECoNT weights YYY model, you should run the script DECoNT_test_polish_XXX.py with polishing_model = 'YYY'.


# Table1
To produce the results presented in Table 1, cd into Table1 directory and run DECoNT_test_chaisson_XXX.py scripts given in the directory. These 3 scripts will directly produce the results given in the Table 1.

# Table2
The results presented in Table 2 belongs to different sequencing technologies. Thus, Table 2 directory has 4 different subdirectories inside (e.g. NovaSeq 6000, HiSeq 4000 etc.). Each of these subdirectories has 3 different directories inside named XHMM, CoNIFER or CODEX2. In general, if you want to reproduce the results belonging to NovaSeq 6000 sequencing technology with  XHMM as WES-based CNV caller, just cd into ./Table2/NovaSeq\ 6000/XHMM and run the script DECoNT_test_novaseq_na12878_xhmm.py. Each of the subdirectories inside Table2 directory has the same structure for reproducing the results. 

# Table3
Table3 directory has exactly the same structure with Table2 directory, however since it is exact copy number prediction case, as presented in the paper, WES-based CNV caller is Control-FREEC in this case. In order to reproduce the results given in Table 3, just cd into one of the following sequencing technologies given as subdirectory (e.g. cd into MGISEQ2000) and run the script given inside the directory test_freec_mgiseq_na12878.py. The script will reproduce the results.


------------------------------------------------------------------------

## Training the model from scratch:

DECoNT has 4 different model weight files inside it. One for each WES-based CNV caller (i.e. XHMM, CoNIFER, CODEX2, Control-FREEC). All of the tests performed above uses the respective model weights for the result. If you want to reproduce the model weights, please cd into Training directory and follow steps given in the README file provided in the directory. 


## Working times of test scripts for each result, on a normal computer:

We measured the working times of the scripts using a 2015 MacBook Pro computer with 8 GB 1867 MHz DDR3 RAM and 2.7GHz Dual-Core Intel Core i5 processor. However, since the computer specifications might vary, we also included an estimated time (i.e. calculates the estimated time by taking computer specifications into account) to be displayed on the console for each script.

The measured times are given below for each result presented:


1) Fig2a, ~3 hours 20 mins
2) Fig2b, ~1 hour 35 mins
3) Fig2c, ~1 hour 10 mins
4) Fig3, ~15 hours
5) Table1, ~2 hours 23 mins
6) Table2, ~10 hours 34 mins
7) Table3, ~15 mins



