A deconvolution framework that uses single-cell sequencing plus a small benchmark data set for accurate analysis of cell type ratios in complex tissue samples
- Shuai Guo1,13,
- Xiaoqian Liu1,13,15,
- Xuesen Cheng2,13,
- Yujie Jiang1,3,
- Shuangxi Ji1,
- Qingnan Liang2,
- Andrew Koval1,3,
- Yumei Li2,
- Leah A. Owen4,5,6,16,
- Ivana K. Kim7,
- Ana Aparicio8,
- Sanghoon Lee9,
- Anil K. Sood9,
- Scott Kopetz10,
- John Paul Shen10,
- John N. Weinstein1,11,
- Margaret M. DeAngelis4,5,6,12,
- Rui Chen2,14 and
- Wenyi Wang1,14
- 1Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
- 2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
- 3Department of Statistics, Rice University, Houston, Texas 77005, USA;
- 4Department of Ophthalmology, Jacobs School of Medicine and Biomedical Engineering, SUNY University at Buffalo, Buffalo, New York 14209, USA;
- 5Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, Utah 84108, USA;
- 6Department of Ophthalmology and Visual Sciences, University of Utah School of Medicine, Salt Lake City, Utah 84132, USA;
- 7USA Retina Service, Harvard Medical School, Massachusetts Eye and Ear, Boston, Massachusetts 02114, USA;
- 8Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77230, USA;
- 9Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas 77230, USA;
- 10Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
- 11Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
- 12VA Western New York Healthcare System, Buffalo, New York 14215, USA
Abstract
Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, that is, benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark data sets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark data sets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched data set to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark data set is available.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278822.123.
-
Freely available online through the Genome Research Open Access option.
- Received December 5, 2023.
- Accepted November 19, 2024.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











