Analysis of coding gene expression from small RNA sequencing

  1. Antonio Marco1
  1. 1School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
  1. 2 These authors contributed equally to this work.

  • Corresponding authors: amarco{at}essex.ac.uk, gbrooke{at}essex.ac.uk
  • Abstract

    The popularity of microRNA expression analyses is reflected by the existence of thousands of sRNA-seq studies in which matched total RNA-seq data are often unavailable. The lack of paired sequencing experiments limits the analysis of microRNA–gene regulatory networks. Here, we explore whether protein-coding gene expression can be quantified directly from transcript fragments present in sRNA-seq experiments. We analyze studies containing matched total RNA and small RNA from four human tissues and recover transcript fragments from the sRNA-seq data sets. We find that the expression levels of protein-coding gene transcripts derived from sRNA-seq data sets are comparable to those from total RNA-seq experiments (R2 ranging from 0.33 to 0.76). Analyses across multiple tissues and species show similar correlations, indicating that the approach is applicable across organisms. We confirm that transcript half-life and the expression of housekeeping or highly abundant genes do not bias the results. Analysis of the expression of both microRNAs and coding genes from the same sRNA-seq experiments demonstrates that known microRNA–target interactions are, as expected, inversely correlated with the expression profiles of these microRNA–mRNA pairs. For a dual mRNA/miRNA profile, we recommend sequencing the ≥25 nucleotide fraction at 5 million or more reads. To confirm the utility of this approach, we apply our method to breast cancer sRNA-seq data sets lacking total RNA-seq data and achieve 75% recall and 64% accuracy comparing inferred coding gene expression with qPCR-validated targets. Our findings demonstrate that quantifying mRNA fragments from sRNA-seq experiments provides a reliable approach to investigate microRNA–mRNA interactions when total RNA-seq is unavailable.

    Footnotes

    • Received September 4, 2025.
    • Accepted December 17, 2025.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Preprint Server