Shuai Guo; Xiaoqian Liu; Xuesen Cheng; Yujie Jiang; Shuangxi Ji; Qingnan Liang; Andrew Koval; Yumei Li; Leah A. Owen; Ivana K. Kim; Ana Aparicio; Sanghoon Lee; Anil K. Sood; Scott Kopetz; John Paul Shen; John N. Weinstein; Margaret M. DeAngelis; Rui Chen; Wenyi Wang

Figure 2.

Overview of DeMixSC. The DeMixSC framework for deconvolution analysis of bulk RNA-seq data using sc/sn RNA-seq data as a reference. (A) The framework starts with a benchmark data set of matched bulk and sc/snRNA-seq data with the same cell type proportions. Pseudobulk mixtures are generated from the sc/sn data. DeMixSC identifies genes in G₁ and G₂ with the matched real-bulk and pseudobulk data. The non-DE genes are considered stably captured by both sequencing platforms (blue), whereas the DE genes are more impacted by the technological discrepancy (orange). (B) DeMixSC then employs a normalization procedure to perform the alignment between two bulk RNA-seq data sets (e.g., with ComBat). (C) DeMixSC estimates cell type proportions under a weighted nonnegative least square (wNNLS) framework with two improvements: (1) partitioning and adjusting genes with high technological discrepancy and (2) a new weight function. The final estimates are obtained when the algorithm either converges or reaches the prespecified maximum number of iterations. Here, G₁ is genes with low technological discrepancy, G₂ is genes with high technological discrepancy, a is a user-defined positive constant that serves as an adjustment factor, $\text{[math]}$ is the reference matrix derived from the sc/snRNA-seq data, y is the observed expression in bulk RNA-seq data, $\text{[math]}$ is the vector of estimated cell type proportions, and $\text{[math]}$ is the estimated gene weights.

A deconvolution framework that uses single-cell sequencing plus a small benchmark data set for accurate analysis of cell type ratios in complex tissue samples

This Article

Preprint Server

Current Issue

In This Issue