## Cliffy Experiments

This repository contains the code for reproducing the experiments in the Cliffy paper (**Robust 16S rRNA classification based on a compressed LCA index**). This code uses `Snakemake`
to run each of the different experiments. Below is  brief description of each experiment and how to run it.

---

### Experiment 1: Building Cliffy indexes for 16S rRNA classification

In this experiment, the code will generate various Cliffy indexes using different strategies to digest the input FASTA such as using traditional minimizers and using a promoted minimizer alphabet. As well as using strategies to compress the document array profiles such as cliff-compression.

```sh
# Use this command to run experiment 1 (replace -c1 with -n to run dry-run first)
snakemake -c1 run_exp1
```

### Experiment 2: Build realistic, simulated 16S rRNA readsets

This experiment will generate simulated read sets with ground truth labels that will be later classified with Cliffy and other approaches. The code uses tool we developed called [MicrobeMixer](https://github.com/oma219/MicrobeMixer) that uses API queries to EBI to simulate readsets that mimic true public data.

```sh
# Use this command to run experiment 2 (replace -c1 with -n to run dry-run first)
snakemake -c1 run_exp2
```

### Experiment 3: Run classifications of 16S rRNA readsets using Cliffy and Kraken2

This experiment performs the benchmarking comparing Cliffy and Kraken2 using the readsets generated in Experiment 2.

```sh
# Use this command to run experiment 3 (replace -c1 with -n to run dry-run first)
snakemake -c1 run_exp3
```
