Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments

  1. Mihaela Pertea1,2,3
  1. 1Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21211, USA;
  2. 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  3. 3Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  4. 4Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205, USA
  • Corresponding authors: ales.varabyou{at}jhu.edu, mpertea{at}jhu.edu
  • Abstract

    RNA sequencing is widely used to measure gene expression across a vast range of animal and plant tissues and conditions. Most studies of computational methods for gene expression analysis use simulated data to evaluate the accuracy of these methods. These simulations typically include reads generated from known genes at varying levels of expression. Until now, simulations did not include reads from noisy transcripts, which might include erroneous transcription, erroneous splicing, and other processes that affect transcription in living cells. Here we examine the effects of realistic amounts of transcriptional noise on the ability of leading computational methods to assemble and quantify the genes and transcripts in an RNA sequencing experiment. We show that the inclusion of noise leads to systematic errors in the ability of these programs to measure expression, including systematic underestimates of transcript abundance levels and large increases in the number of false-positive genes and transcripts. Our results also suggest that alignment-free computational methods sometimes fail to detect transcripts expressed at relatively low levels.

    Footnotes

    • Received May 21, 2020.
    • Accepted December 18, 2020.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Preprint Server