Overview of the batch correction methods
| BBKNN | Combat | ComBat-seq | Harmony | LIGER | MNN | SCVI | Seurat | |
|---|---|---|---|---|---|---|---|---|
| Input | k-NN graph | Normalized count matrix | Raw count matrix | Normalized count matrix | Normalized count matrix | Normalized count matrix | Raw count matrix | Normalized count matrix |
| Custom embedding | None | None | None | Corrected embedding | Metagene/factor loadings | None | Learned lower dimensional latent space | CCA |
| Correction object | k-NN graph | Count matrix | Count matrix | Embedding | Embedding | Count matrix | Embedding | Embedding |
| Correction method | UMAP on merged neighborhood graph | Empirical Bayes—linear correction method on the count values | Negative binomial regression model on each gene | Soft k-means—linear batch correction within small clusters in the embedded space | Quantile alignment of factor loadings | Mutual nearest neighbors—linear correction | Variational autoencoder—models the batch effect in a low dimensional space using a deep learning model; a new count matrix is imputed from the model | Aligning canonical basis vectors to correct the embedding |
| Returns | Corrected k-NN graph | Corrected count matrix | Corrected count matrix | Corrected embedding | Corrected embedding | Corrected count matrix | Corrected count matrix and corrected embedding | Corrected count matrix |
| Changes count matrix | No | Yes | Yes | No | No | Yes | Yes/Imputes new values | Yes |
[i] (Input) Type of data that that particular method uses as input; the method may perform additional preprocessing steps on the input object before any calculations are performed, (Custom embedding) particular lower level embedding, if any, which the data is projected onto, (Correction object) actual data object that the method uses to make corrections, (Correction method) informal description of the particular method used for batch correction, (Returns) type of object the method returns, (Changes count matrix) whether the method edits or returns a new count matrix to be used instead of the uncorrected count matrix in any subsequent steps in the workflow.