Scalable cell-specific coexpression networks for granular regulatory pattern discovery with NeighbourNet

Table 1.

NeighbourNet regression setting and computational cost

Data Cells (N) Responses (P) Predictors (Q) Computational time (s) Memory usage (GB)
PBMC3k data N = 2638 P = 288 Q = 4116 LVC: 34.56 25.60
Case study 1 CollecTRI TFs CollecTRI targets Regression: 784.52
Perturb-seq data 1 N = 7411 P = 5 Q = 5151 LVC: 155.39 9.35
Case study 1 CollecTRI TFs CollecTRI targets Regression: 18.6
(Papalexi et al. 2021)
Perturb-seq data 2 (d7) N = 16,506 P = 10 Q = 3227 LVC: 872.87 5.62
Supplemental Results perturbed TFs CollecTRI targets Regression: 120.48
(Dixit et al. 2016)
Perturb-seq data 2 (d13) N = 9633 P = 10 Q = 3227 LVC: 305.20 3.00
Supplemental Results perturbed TFs CollecTRI targets Regression: 63.18
(Dixit et al. 2016)
Lin early hematopetic cell atlas N = 1078 P = 805 Q = 4600 LVC: 33.68
Case study 2 (Subsampled) PKN TFs PKN targets Regression: 1045.09 34.50
(Pellin et al. 2019) Meta-network 482.11
Small cell lung cancer atlas N = 2909 P = 28 Q = 900 LVC: 87.03
Case study 3 (Subsampled) DEGs PKN TFs Regression: 66.61 1.23
(Chan et al. 2021) Meta-network 39.11
  • We summarize the number of cells (N), the number of response genes (P), and the number of predictor genes (Q) involved in the NNet analysis for each case study. Each NNet analysis thus generates an ensemble of N × P × Q coexpression networks. For NNet analyses in the early hematopoiesis (Case study 2) and lung cancer (Case study 3) case studies, networks were built on a subset of cells (indicated by “subsampled”) to represent the full data set, further reducing computational burden. The Computational time column records the runtime of NNet in seconds, broken down into the stages of local gene variance calculation (LVC), regression, and meta-network construction. LVC calculation is an initial part of the coexpression measurement (Methods) and only needs to be performed once before regression. Adjusting the response genes for subsequent regression steps does not require recalculating local variance. The Memory usage column shows the change in memory (in gigabytes) before and after the LVC and regression steps. All the analysis were ran on a RStudio server allocated with four cores of Intel Xeon Gold 6254 @ 3.10 GHz CPU. We did not perform parallel computing. Other acronyms: DEGs: differentially expressed genes. PKN: prior knowledge network.

This Article

  1. Genome Res. 36: 785-801

Preprint Server