unCTC uses pathway-based unsupervised clustering of single-cell RNA-Seq data to stratify CTCs and WBCs. It takes a list of Countdata/TPM single-cell RNA-Seq expression matrices as input. Genes must be aligned in rows, whereas cells must be aligned in columns in the expression matrix. unCTC combines all matrices based on shared genes, removes low-expression genes and cells, eliminates batch effects, and normalises the integrated matrix. A normalised matrix is transferred to pathway space for unsupervised clustering of circulating tumour cells (CTCs) and white blood cells (WBCs), and deep dictionary learning with k-means clustering is used. Copy number variations (CNVs) are also computed by unCTC, indicating the frequency of CNVs as well as the site of the p/q arm variation. Using Stouffer's Z-score (Stouffer et al., 1949), UnCTC detects multiple canonical markers suggesting malignant/epithelial/immune origins. Other canonical markers' expression validates the lineage of circulating tumour cells (CTCs).

unCTC_Study1_Count: In this script we used Study 1 count data. We Sourced all the methods used to analyse data in Supplemental_Code/unCTC folder. 
unCTC_Study2_TPM: In this script we used Study 2 TPM data. We Sourced all the methods used to analyse data in Supplemental_Code/unCTC folder. 
unCTC_Study2_Count: In this script we used Study 2 count data. We Sourced all the methods used to analyse data in Supplemental_Code/unCTC folder. 