NeighbourNet regression setting and computational cost
| Data | Cells (N) | Responses (P) | Predictors (Q) | Computational time (s) | Memory usage (GB) |
|---|---|---|---|---|---|
| PBMC3k data | N = 2638 | P = 288 | Q = 4116 | LVC: 34.56 | 25.60 |
| Case study 1 | CollecTRI TFs | CollecTRI targets | Regression: 784.52 | ||
| Perturb-seq data 1 | N = 7411 | P = 5 | Q = 5151 | LVC: 155.39 | 9.35 |
| Case study 1 | CollecTRI TFs | CollecTRI targets | Regression: 18.6 | ||
| (Papalexi et al. 2021) | |||||
| Perturb-seq data 2 (d7) | N = 16,506 | P = 10 | Q = 3227 | LVC: 872.87 | 5.62 |
| Supplemental Results | perturbed TFs | CollecTRI targets | Regression: 120.48 | ||
| (Dixit et al. 2016) | |||||
| Perturb-seq data 2 (d13) | N = 9633 | P = 10 | Q = 3227 | LVC: 305.20 | 3.00 |
| Supplemental Results | perturbed TFs | CollecTRI targets | Regression: 63.18 | ||
| (Dixit et al. 2016) | |||||
| Lin− early hematopetic cell atlas | N = 1078 | P = 805 | Q = 4600 | LVC: 33.68 | |
| Case study 2 | (Subsampled) | PKN TFs | PKN targets | Regression: 1045.09 | 34.50 |
| (Pellin et al. 2019) | Meta-network 482.11 | ||||
| Small cell lung cancer atlas | N = 2909 | P = 28 | Q = 900 | LVC: 87.03 | |
| Case study 3 | (Subsampled) | DEGs | PKN TFs | Regression: 66.61 | 1.23 |
| (Chan et al. 2021) | Meta-network 39.11 |
-
We summarize the number of cells (N), the number of response genes (P), and the number of predictor genes (Q) involved in the NNet analysis for each case study. Each NNet analysis thus generates an ensemble of N × P × Q coexpression networks. For NNet analyses in the early hematopoiesis (Case study 2) and lung cancer (Case study 3) case studies, networks were built on a subset of cells (indicated by “subsampled”) to represent the full data set, further reducing computational burden. The Computational time column records the runtime of NNet in seconds, broken down into the stages of local gene variance calculation (LVC), regression, and meta-network construction. LVC calculation is an initial part of the coexpression measurement (Methods) and only needs to be performed once before regression. Adjusting the response genes for subsequent regression steps does not require recalculating local variance. The Memory usage column shows the change in memory (in gigabytes) before and after the LVC and regression steps. All the analysis were ran on a RStudio server allocated with four cores of Intel Xeon Gold 6254 @ 3.10 GHz CPU. We did not perform parallel computing. Other acronyms: DEGs: differentially expressed genes. PKN: prior knowledge network.











