Alignment of single-cell RNA-seq samples without overcorrection using kernel density matching

  1. Yang I. Li1,2
  1. 1Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA;
  2. 2Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA;
  3. 3Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, Illinois 60637, USA;
  4. 4Department of Clinical Immunology, Xijing Hospital, Xi'an 710032, China;
  5. 5National Translational Science Center for Molecular Medicine, Xi'an 710032, China
  • Corresponding authors: mengjiechen{at}uchicago.edu, zhuping{at}fmmu.edu.cn, yangili1{at}uchicago.edu
  • Abstract

    Single-cell RNA sequencing (scRNA-seq) technology is poised to replace bulk cell RNA sequencing for many biological and medical applications as it allows users to measure gene expression levels in a cell type–specific manner. However, data produced by scRNA-seq often exhibit batch effects that can be specific to a cell type, to a sample, or to an experiment, which prevent integration or comparisons across multiple experiments. Here, we present Dmatch, a method that leverages an external expression atlas of human primary cells and kernel density matching to align multiple scRNA-seq experiments for downstream biological analysis. Dmatch facilitates alignment of scRNA-seq data sets with cell types that may overlap only partially and thus allows integration of multiple distinct scRNA-seq experiments to extract biological insights. In simulation, Dmatch compares favorably to other alignment methods, both in terms of reducing sample-specific clustering and in terms of avoiding overcorrection. When applied to scRNA-seq data collected from clinical samples in a healthy individual and five autoimmune disease patients, Dmatch enabled cell type–specific differential gene expression comparisons across biopsy sites and disease conditions and uncovered a shared population of pro-inflammatory monocytes across biopsy sites in RA patients. We further show that Dmatch increases the number of eQTLs mapped from population scRNA-seq data. Dmatch is fast, scalable, and improves the utility of scRNA-seq for several important applications. Dmatch is freely available online.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.261115.120.

    • Freely available online through the Genome Research Open Access option.

    • Received January 13, 2020.
    • Accepted January 19, 2021.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server