John T. Chamberlin; Younghee Lee; Gabor T. Marth; Aaron R. Quinlan

Figure 2.

RNA sampling properties explain the effect of bioinformatic processing on assay similarity. (A) Gene length bias is evident in averaged gene abundances in both L5 IT cells and nuclei under the Intron&Exon quantification but is stronger in nuclei. Pearson's correlations between log₁₀ (mean CPM) and gene length are R = 0.51 in nuclei and R = 0.29 in cells. (B) The gene length distribution of genes that are significantly more abundant (fold change >1.5, adjusted P-value < 0.05) either in L5 IT cells (gray) or in nuclei (orange). Mean log₁₀ gene lengths are 5.0 versus 4.2, P < 2.2 × 10⁻¹⁶ (t-test on log₁₀ lengths). (C) Hexbin plot showing the correlation of Exon abundances between L5 IT cells and nuclei. Pearson's correlations are computed on log₁₀ (mean CPM across all cells or nuclei) for genes above one or 10 mean CPM in both assays. (D) Intron&Exon abundances are more strongly correlated and show fewer total differences. (E) The correlation of Intron abundances is very high, consistent with pre-mRNA localization within the nucleus, which is within the cell. (F) Length-corrected abundances are no better correlated than the baseline result. Total differences increase, consistent with the worsened correlation among more highly expressed genes. The length-correction method depresses Intron counts, which indirectly amplifies the prominence of Exon.

Differences in molecular sampling and data processing explain variation among single-cell and single-nucleus RNA-seq experiments

This Article

Preprint Server

Current Issue

In This Issue