# The Great Cattle Antibody Repertoire (Supplemental Code)
Longtitudal study of vaccination of 204 calves against the bovine respiratory disease 

## Scripts
- **igqtl.py**: takes **vgene_genotyping** directory as an input (see Supplemental Materials), computes germline and somatic variations, and reports their R-ratios for all subjects:
```
python igqtl.py vgene_genotyping_dir output_dir
```

The output directory contains R-ratios of GSVs per V gene and all GSV combined together. Each identified GSV is written as a column and have following format:
```
VGENE:POSITION_N1N2 
```
where POSITION is 0-based, N1 and N2 are the most abundant and the second most abundant nucleotides at this position, respectively.

-  compute_pca_clusters.py: takes the file with R-ratios of GSVs and compute clusters of subjects using PCA:
```
python compute_pca_clusters.py gsv_r_ratios.txt output_dir
```

### Python dependencies:
- scipy, matplotlib, numpy (usually a part of the standard python installation with conda)
- kneed (conda install -c conda-forge kneed)
- pandas (conda install -c anaconda pandas)
- seaborn (conda install -c anaconda seaborn)
- sklearn (conda install -c anaconda scikit-learn)
