A spatiotemporally resolved atlas of mRNA decay in the C. elegans embryo reveals differential regulation of mRNA stability across stages and cell types
Abstract
During embryonic development, cells undergo dynamic changes in gene expression that are required for appropriate cell fate specification. Although both transcription and mRNA degradation contribute to gene expression dynamics, patterns of mRNA decay are less well understood. Here, we directly measure spatiotemporally resolved mRNA decay rates transcriptome-wide throughout C. elegans embryogenesis by transcription inhibition followed by bulk and single-cell RNA sequencing. This allows us to calculate mRNA half-lives within specific cell types and developmental stages, and identify differentially regulated mRNA decay throughout embryonic development. We identify transcript features that are correlated with mRNA stability and find that mRNA decay rates are associated with distinct peaks in gene expression over time. Moreover, we provide evidence that, on average, mRNA is more stable in the germline than in the soma and in later embryonic stages than in earlier stages. This work suggests that differential mRNA decay across cell states and time helps to shape developmental gene expression, and it provides a valuable resource for studies of mRNA turnover regulatory mechanisms.
Development is a highly regulated process that relies on the precise spatial and temporal control of gene expression patterns. Although transcriptional regulation is heavily studied, the regulation of mRNA decay can also be complex and necessarily influences mRNA abundance (Alonso 2012). Transcripts are often specified for decay through the binding of RNA and protein factors to cis-regulatory elements within the 3′ untranslated region (UTR) (Ray et al. 2013; Vejnar et al. 2019). Cross talk between factors can lead to differential mRNA decay, such as RNA-binding proteins stabilizing and protecting transcripts from miRNA-mediated repression (Bhattacharyya et al. 2006; Young et al. 2012). Furthermore, computational analyses predict widespread antagonistic or synergistic actions between RNA-binding proteins and miRNAs in regulating gene expression (Jiang et al. 2013). Translational efficiency and features that influence it, such as codon optimality, are also key determinants of mRNA stability. From yeast to humans, greater translational efficiency is associated with greater transcript stability (Presnyak et al. 2015; Wu et al. 2019). Once transcript turnover is specified, transcripts are ultimately targeted to one of many distinct decay pathways, such as those mediated by the major exoribonucleases XRN1 (5′-3′ decay) and the RNA exosome (3′-5′ decay) (Łabno et al. 2016). Given the potential complexity in the regulation of mRNA degradation, differential mRNA decay in different lineages, cell types, or developmental stages could contribute to distinct gene expression patterns. The degree to which such regulation occurs throughout development is unclear. Measuring how mRNA degradation varies across different cell types and developmental stages would determine the extent of differential mRNA decay during development and provide the basis for identifying mechanisms for this regulation.
The developmental importance of mRNA degradation is underscored by the maternal-to-zygotic transition, in which maternal gene products in the early embryo must be degraded for the control of development to switch to zygotically encoded products (Vastenhouw et al. 2019). Global studies in both vertebrate and invertebrate model organisms have measured maternal mRNA decay and identified key RNA-binding proteins and small RNAs involved in the clearance of maternal products (Giraldez et al. 2006; Tadros et al. 2007). In addition, studies of zygotic mRNA degradation have uncovered cell differentiation events that are influenced by mRNA decay. For example, timely decay of glial cells missing transcripts in the fly embryo is important for appropriate nervous system differentiation (Soustelle et al. 2008), and stabilization of muscle-specific mRNAs by the RNA-binding protein Human antigen R promotes myogenesis (van der Giessen and Gallouzi 2007). The regulation of mRNA degradation for developmental regulators may therefore be important in patterning fates. As studies of zygotic mRNA decay tend to be on a gene-by-gene basis or in the context of cell culture systems, global studies in complex organisms will be necessary to establish the extent of developmentally regulated zygotic decay.
The nematode Caenorhabditis elegans provides an ideal system for the global study of zygotic mRNA decay owing to its invariant cell lineage, experimental tractability, and conservation of RNA decay machinery with humans. In this study, we undertook the first global measurement of mRNA decay rates throughout C. elegans embryogenesis at high resolution by pairing a transcription inhibition approach with both bulk and single-cell RNA sequencing. Using this atlas of decay rates, we investigated the contribution of mRNA decay toward specific gene expression patterns across developmental stages and cell types, as well as how differential mRNA degradation may be regulated.
Results
Transcription inhibition by actinomycin D allows mRNA decay measurements in C. elegans embryonic cells
To measure zygotic mRNA decay throughout C. elegans embryogenesis, we measured changes in gene expression after transcription inhibition (Fig. 1A). Primary cell cultures of C. elegans embryos divide and differentiate in ways that mirror development of intact embryos, with a similar range of cell types differentiating at roughly the same time they would in intact embryos (Edgar 1995; Christensen et al. 2002). We took advantage of this by dissociating mixed-stage embryos into single cells and treating the cells with the transcription inhibitor actinomycin D (actD) in a time course experiment. As accurately staging or synchronizing large numbers of C. elegans embryos is nontrivial, using populations of mixed-staged embryos allowed us to measure mean mRNA decay rates in a population of cells spanning a broad range of embryonic stages, possibly at the cost of reduced detection of very stage-specific genes. Transcription inhibition was efficient, as measured by the rapid decrease in levels of unspliced transcripts for several housekeeping genes within minutes of actD treatment (Supplemental Fig. S1A). On the other hand, levels of unspliced transcripts were maintained in cultured cells not treated with actD (Supplemental Fig. S1B). After an hour of actD treatment, unspliced transcripts remained low and cell viability remained high (>90%).
Transcription inhibition by actinomycin D (actD) allows mRNA decay measurements in C. elegans embryonic cells. (A) A schematic representation of the approach to measure mRNA half-lives in the C. elegans embryo using transcription inhibition and bulk RNA sequencing. (B) Distribution of mRNA half-lives across three biological replicates that met a moderate filtering strategy. The median half-life was 54 min (black line). Seventy-two genes with half-lives >200 min are not shown. (C) Select Gene Ontology (GO) categories enriched among the top 15% stable and unstable transcripts. Background set of genes used was all genes that met our moderate mRNA half-life filtering metric. (D) A schematic representation of a gene with highly persistent expression and a gene with highly transient expression and the predicted overall stability of their transcripts. (E, left) Expression over time of highly persistent (rpl-35, pab-1) and highly transient (zip-7, hlh-14) genes from a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). (Right) The measured mRNA decay of the corresponding genes from our bulk RNA sequencing data, with each point representing normalized transcript abundance from one of three biological replicates. (F) Box plots showing the mRNA half-life distributions of genes characterized as highly transient, transient, persistent, or highly persistent transcriptome-wide. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, one, five, 50, and 57 genes within each group had mRNA half-lives >150 min.
To determine half-lives transcriptome-wide, we measured mRNA levels in cells after 0, 10, 20, 40, and 60 min of actD treatment. After each time point, samples were spiked with External RNA Controls Consortium (ERCC) control transcripts (Baker et al. 2005) before RNA was isolated and analyzed using total RNA sequencing after ribosomal RNA depletion (Supplemental Table S1). An mRNA half-life was calculated for each gene by fitting ERCC-normalized gene expression over time to an exponential decay equation across all replicates. The measured half-lives had a median of 54 min and varied by more than an order of magnitude from gene to gene, with most ranging from ∼20 to 200 min (Fig. 1B; Supplemental Table S2).
Several pieces of evidence indicate that these half-lives are reproducible and biologically meaningful. First, calculating half-lives separately for each of three biological replicates yielded reproducible half-life estimates (Spearman's ρ > 0.7 for all pairwise comparisons) (Supplemental Fig. S1C–E). We focused our downstream analyses on high-confidence half-life measurements by implementing a filtering strategy based on an initial count of 30 mapped reads for each gene and coefficient of variation and 95% confidence interval thresholds for half-lives across biological replicates (see Methods) (Supplemental Fig. S1F).
Second, genes with the slowest or fastest mRNA decay rates were enriched for different functional categories. Gene Ontology analysis found an enrichment among the most stable transcripts for housekeeping functions, such as ribosomal and cytoskeletal proteins (Fig. 1C; Supplemental Fig. S2A). In contrast, among the most unstable mRNAs, there was an enrichment for dynamic developmental processes, such as transcription factor activity and the cell cycle (Fig. 1C; Supplemental Fig. S2B). These functional associations with RNA stability generally agree with those found in previous studies of decay in other systems (Yang et al. 2003; Narsai et al. 2007; Thomsen et al. 2010; Burow et al. 2015).
Third, we found that RNA stability correlates with temporal dynamics in untreated embryonic cells. High mRNA stability should result in persistent expression over time, whereas faster mRNA degradation could allow genes to have transient expression (Fig. 1D). We categorized genes as having either transient or persistent expression over time in a previously published C. elegans embryo single-cell atlas (Packer et al. 2019) based on the magnitude and frequency of expression decrease from mother-to-daughter cell states (see Methods). For example, the transcription factor genes zip-7 and hlh-14 have highly transient expression, with rapid loss of expression from parent-to-daughter cells in multiple lineages (Supplemental Fig. S2C,D). On the other hand, the genes rpl-35 and pab-1 have highly persistent expression maintained from parent-to-daughter cells across multiple lineages (Supplemental Fig. S2E,F). These genes show similar temporal dynamics in a separate C. elegans whole-embryo time series data set (Hashimshony et al. 2015); zip-7 and hlh-14 increase to peaks in expression ∼200 min past the four-cell stage before dropping off rapidly, whereas rpl-35 and pab-1 have fairly constant expression over time (Fig. 1E). In agreement with our expectations, we found that zip-7 and hlh-14 transcripts have much shorter half-lives than those of rpl-35 and pab-1 (Fig. 1E). Across the transcriptome, highly transient genes generally had the shortest mRNA half-lives, whereas highly persistent genes generally had the longest mRNA half-lives (Fig. 1F). These findings are consistent with gene-specific mRNA decay rates having an important contribution to gene expression dynamics.
Transcript stability correlates with specific sequence features
To identify features that differ between stable and unstable transcripts, we tested for correlations between our mRNA stability measurements and various transcript attributes. This identified numerous features associated with transcript stability.
Codon optimality concerns the nonuniform decoding rate of codons by the ribosome, with a codon being considered optimal when its cognate tRNA is prevalent and allows for rapid decoding and ribosome movement (Bae and Coller 2022). In examining codon optimality scores between the most stable and unstable 15% of transcripts, we found that the most stable transcripts had significantly greater codon optimality than the least stable transcripts (Fig. 2A).
Transcript stability correlates with specific sequence features. The relationship between transcript stability and different sequence features was examined using the top 15% stable and unstable transcripts identified in our bulk RNA sequencing data. The box plots compare different features between all, stable, and unstable transcripts. Numbers to the left of the box plots are median values within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median values were calculated using the Wilcoxon rank-sum test. Outliers are not shown. (A) Distribution of codon optimality scores, as measured using tRNA adaptation index (tAI) values for each gene. Higher tAI values correspond to greater codon optimality. (B) Distribution of the number of introns averaged across all splice isoforms of a gene. (C) Distribution of 3′ UTR length averaged across all 3′ UTR isoforms of a gene. (D) Distribution of the percentage of GC content in the coding sequence of genes. For genes with multiple coding sequence isoforms, the longest isoform was used. (E) Distribution of the percentage of GC content in the 3′ UTR of genes. For genes with multiple 3′ UTR isoforms, the longest isoform was used. (F) Distribution of coding sequence length averaged across all splice isoforms of a gene. (G) Linear regression was used to identify the percentage of variation in mRNA half-lives explained by individual sequence features. (H) Multiple linear regression was used to identify the percentage of variation in mRNA half-lives explained by combinations of different sequence features. (I) MEME (Bailey et al. 2015) identified three motifs enriched in the 3′ UTRs of the top 15% stable transcripts compared with the 3′ UTRs of the top 15% unstable transcripts. For genes with multiple 3′ UTR isoforms, the longest isoform was used. The best match of each de novo motif to known motifs in mammals (Ray et al. 2013) is noted.
Because the presence of introns has been associated with higher translation, we asked whether intron number is correlated with mRNA stability (Shaul 2017). We found that unstable mRNAs had significantly fewer introns compared with stable mRNAs, with a median of 4.5 and 5.7 mean number of introns per gene, respectively (Fig. 2B). This was associated with differences in overall gene length; stable transcripts had longer mean coding sequence lengths compared with unstable transcripts (Fig. 2F).
Because 3′ UTRs can influence mRNA stability (Mayr 2019), we tested for 3′ UTR characteristics correlated with half-lives. Unstable transcripts had significantly shorter 3′ UTR lengths compared with stable transcripts, with a median difference of 98 bp (Fig. 2C). The nucleotide composition of both the 3′ UTR and coding sequence also varied with stability, with higher GC content for stable transcripts (Fig. 2D,E). In line with the findings for 3′ UTR and coding sequence length, the overall transcript length was significantly longer in stable transcripts compared with unstable transcripts (Supplemental Fig. S3C). We also found a relationship between mRNA half-lives and the number of 3′ UTR isoforms, with half-lives being significantly longer for genes with a higher number of 3′ UTR isoforms (Supplemental Fig. S3D,E).
We tested whether these different transcript features provide independent information about mRNA half-lives by using linear regression modeling. Codon optimality explained the largest fraction of variation in mRNA stability (∼21%) (Fig. 2G), whereas other features such as number of introns, 3′ UTR length, and GC content in the coding sequence provided independent information about mRNA stability (Fig. 2H). Most of the variation in mRNA stability, however, remains unexplained by the examined sequence features.
To identify sequence motifs associated with differences in transcript stability, we used the de novo motif-finding program MEME (Bailey et al. 2015) to identify motifs differentially enriched in the 3′ UTRs of the most stable and unstable transcripts. No statistically significant motifs were enriched for unstable transcripts, but several motifs were differentially enriched for stable transcripts (Fig. 2I; Supplemental Fig. S3A). To identify potential regulators of RNA stability, we compared these motifs against a database of RNA-binding protein motifs, which were identified by incubating RNA-binding proteins from many eukaryotic species within a complex pool of RNAs (Ray et al. 2013). The motifs resembled poly(C)-binding protein (PCBP), poly(A)-binding protein (PABP), and heterogeneous nuclear ribonucleoprotein (HNRNP) family binding sites (Fig. 2I; Supplemental Fig. S3B). RNA-binding proteins from these families have previously been shown to have transcript-stabilizing effects in other systems (Makeyev and Liebhaber 2002; Geuens et al. 2016; Passmore and Coller 2022). Future experiments will be needed to determine whether they are responsible for the increased stability of transcripts carrying these motifs in C. elegans embryos.
These findings highlight that stable and unstable transcripts are enriched for distinct sequence features, consistent with multiple mechanisms regulating mRNA stability. Codon optimality appears to be the single strongest predictor of mRNA stability in the C. elegans embryo. Additional complexity to the regulation of mRNA decay may result from other binding sites for regulatory RNAs and proteins that act in specific contexts.
High transcript accumulation is associated with increased mRNA stability
Throughout C. elegans embryogenesis, many genes accumulate to high transcript levels. Although this requires high transcription rates, we asked whether rapid transcript accumulation is also associated with increased mRNA half-life, as such accumulation would be difficult to achieve in the presence of rapid RNA turnover. Genes with lower accumulation rates, on the other hand, could be more tolerant of RNA turnover rates (Fig. 3A). For example, a moderate transcription rate with a moderate mRNA half-life or a high transcription rate with a high RNA turnover could yield similar slopes of RNA versus time.
High transcript accumulation is associated with increased mRNA stability. (A) A schematic representation of genes that accumulate high, medium, or low transcript levels and the expected range of transcription and mRNA decay rates that could contribute to such accumulation. (B) Examples of genes that peak in expression ∼200 min after the four-cell stage with high, medium, and low transcript accumulation, from left to right. Expression data taken from a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). (C) Examples of genes that peak in expression ∼350 min after the four-cell stage with high, medium, and low transcript accumulation, from left to right. Expression data taken from a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). (D) Box plots showing the mRNA half-life distributions of genes that peak in expression ∼200 min after the four-cell stage to low, medium, and high transcript levels. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, one, zero, and seven genes within each group had mRNA half-lives >125 min. (E) Box plots showing the mRNA half-life distributions of genes that peak in expression ∼350 min after the four-cell stage with low, medium, and high transcript levels. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, zero, zero, and two genes within each group had mRNA half-lives >175 min. (F) MEME (Bailey et al. 2015) identified three motifs enriched in the 3′ UTRs of genes with high transcript accumulation compared with the 3′ UTRs of genes with low transcript accumulation ∼200 min after the four-cell stage. For genes with multiple 3′ UTR isoforms, the longest isoform was used. The best match of each de novo motif to known motifs in mammals (Ray et al. 2013) is noted. (G) MEME (Bailey et al. 2015) identified three motifs enriched in the 3′ UTRs of genes with high transcript accumulation compared with the 3′ UTRs of genes with low transcript accumulation ∼350 min after the four-cell stage. For genes with multiple 3′ UTR isoforms, the longest isoform was used. The best match of each de novo motif to known motifs in mammals (Ray et al. 2013) is noted. (H) The gene expression patterns for three putative RNA-binding protein genes in C. elegans from a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). The genes are homologs of mammalian RNA-binding protein genes whose corresponding proteins are known to bind motifs similar to those discovered in F and G.
Using a whole-embryo RNA sequencing time series (Hashimshony et al. 2015), we identified 985 dynamic genes whose expression peaks at ∼200 min past the four-cell stage. We categorized these genes as having high (top 20%), medium (middle 20%), or low (bottom 20%) transcript accumulation rates (Fig. 3B). For example, the high-accumulation gene cht-1 peaks at an expression level of around 6300 transcripts per million (TPM), which is in the top 1% of maximum expression across all genes and accounts for ∼0.6% of transcripts in the embryo. On the other hand, the medium-accumulation gene ced-11 peaks at around 150 TPM, and the low-accumulation gene sec-10 peaks at around 60 TPM.
We found that genes with high accumulation had a median mRNA half-life ∼27% longer than that of genes with low accumulation (median t1/2 of 52 vs. 41 min) (Fig. 3D). Similar results were observed for genes whose expression peaks later, ∼350 min past the four-cell stage (Fig. 3C). At this stage, high-accumulation genes had a median mRNA half-life ∼45% longer than that of low-accumulation genes (median t1/2 of 81 vs. 56 min) (Fig. 3E).
We identified motifs differentially enriched in the 3′ UTRs of high-accumulation genes compared with the 3′ UTRs of low-accumulation genes using MEME (Fig. 3F,G; Supplemental Fig. S4A,C; Bailey et al. 2015). High-accumulation genes peaking at ∼200 min were enriched for potential ELAV like RNA binding protein (ELAVL) and RNA binding motif (RBM) family binding sites; those peaking at ∼350 min were enriched for potential HNRNP and TIA1 cytotoxic granule associated RNA binding protein (TIA1) family binding sites; and high-accumulation genes at both stages were enriched for potential PCBP binding sites (Supplemental Fig. S4B,D). Several of the corresponding RNA-binding protein genes have C. elegans homologs that have distinct peaks in mRNA expression throughout embryogenesis (Fig. 3H).
Overall, dynamically expressed genes with high transcript accumulation rates tend to have longer mRNA half-lives. This suggests that regulation of both transcription and mRNA decay help these genes accumulate to high transcript levels.
Single-cell RNA sequencing allows measurement of mRNA half-lives at high resolution throughout C. elegans embryogenesis
Although our bulk RNA sequencing data support the idea that many zygotic genes vary in their decay rates, they do not allow us to identify changes in transcript stability for the same gene in different cell types or developmental stages. To globally identify developmentally regulated mRNA decay in the C. elegans embryo, we combined actD transcription inhibition with single-cell RNA sequencing (Fig. 4A).
Single-cell RNA sequencing allows measurement of mRNA half-lives at high resolution throughout C. elegans embryogenesis. (A) A schematic representation of the approach to measure mRNA half-lives in the C. elegans embryo using transcription inhibition and single-cell RNA sequencing. (B) UMAP projection of the integrated data set of three biological replicates. Cells are colored by the age of the embryo from which a cell was produced, estimated from correlations to a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). Trajectories corresponding to major cell types are labeled. (C) Scatter plot comparing the mRNA half-lives calculated in pseudobulk from the single-cell data and the half-lives calculated from the bulk-cell data on a log-log scale. Spearman's correlation coefficient = 0.76. Dashed line is the x = y line. (D) Box plots showing the stage-specific mRNA half-life distributions of genes within early-, middle-, and late-stage cells (50–200, 200–350, and 350+ min after the four-cell embryo stage, respectively). (E) Box plots showing the cell type–specific mRNA half-life distributions of genes within muscle, germline, epidermis, neuron, and pharynx. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right for D, 203, 345, and 323 genes within each group had mRNA half-lives >100 min; from left to right for E, 141, 36, 150, 70, and 125 genes within each group had mRNA half-lives >150 min.
We dissociated mixed-stage populations of C. elegans embryos into single cells and cultured them for a time course in the presence or absence of actD as before. mRNA levels were then measured in single cells using the 10x Genomics single-cell RNA sequencing platform. Cells from 0 min untreated, 20 min actD-treated, and 40 min actD-treated cells were used to calculate mRNA half-lives. An additional 40 min untreated time point served as a control for the impact of cell culturing. We used this slightly shorter time course to allow for more accurate annotation of actD-treated cells, as modeling from the bulk data indicated that these reduced time points were sufficient to accurately estimate mRNA half-lives for most genes (Supplemental Fig. S5D). We performed the entire experiment across three biological replicates and, in total, captured 183,777 cells (Supplemental Table S1).
We integrated the three biological replicates using Seurat (Stuart et al. 2019), and both manual and automated annotation approaches were used to identify each cell's terminal cell type or lineage identity and developmental stage (in minutes from the four-cell stage; see Methods) (Supplemental Fig. S5J). Cell type annotations were automatically transferred to this data set from an existing C. elegans embryo single-cell atlas (Packer et al. 2019), using a principal component projection approach implemented in Seurat (Stuart et al. 2019). Manual annotation to refine the epidermal cell type annotations (see Methods) was based on cell type–specific marker genes reported in WormBase (Lee et al. 2018). For each cell, the stage of the embryo from which it came from was estimated by correlating its transcriptome with a whole-embryo bulk RNA seq time series (Hashimshony et al. 2015; Packer et al. 2019). The cells here cover a similar range of stages as those in the existing C. elegans embryo single-cell atlas (from gastrulation, around the 50-cell stage, through the early stages of terminal differentiation) (Packer et al. 2019).
We projected the integrated data set to two dimensions using the uniform manifold approximation and projection (UMAP) algorithm (Becht et al. 2019; McInnes et al. 2020), which organized the cells by type and stage (Fig. 4B). From a central group of progenitor cells, trajectories formed that generally reflected progression through embryogenesis and that branched out into major cell types. Projecting data from individual biological replicates using the UMAP algorithm revealed similar cell clustering (Supplemental Fig. S5A–C). These UMAPs resembled the global UMAP generated from our previous C. elegans single-cell RNA sequencing atlas, in which cells were immediately captured for sequencing rather than subjected to short-term culturing (Supplemental Fig. S6B; Packer et al. 2019). Cells from our mRNA decay atlas that were subjected to short-term culturing also coembedded well with cells from this previous single-cell RNA sequencing atlas (Supplemental Figs. S6C–F, S7A–C). Importantly, transcription inhibition or short-term cell culture did not appear to significantly affect the fraction of cells annotated as each major cell type across biological replicates (Supplemental Fig. S5K), consistent with past studies showing that cultured embryonic cells develop similarly to their in vivo fates (Edgar 1995; Christensen et al. 2002).
Half-lives were calculated in a manner similar to those calculated for our bulk data. However, we were not able to use ERCC control transcripts with the single-cell RNA sequencing approach, so expression of each gene was normalized to the expression of ribosomal protein gene transcripts and further adjusted to account for decay of those ribosomal transcripts (see Methods) (Supplemental Fig. S5E,F; Supplemental Table S3). Final pseudobulk half-lives for well-measured transcripts from each of three single-cell biological replicates were well correlated with one another (Spearman's ρ > 0.7 for all pairwise comparisons) (Supplemental Fig. S5G–I) and with half-lives calculated from the bulk data (Spearman's ρ = 0.76) (Fig. 4C). This indicates that the single-cell RNA sequencing approach captures comparable half-lives to those determined from the bulk approach.
To examine how mRNA half-lives change across development, we focused first on broad developmental stages and cell types. Half-lives for these cell subsets were calculated in pseudobulk as before. To examine mRNA stability across developmental stages, we divided embryogenesis into three major phases using estimated embryo time: early (50–200 min), middle (200–350 min), and late (350+ min). The early and middle stages correspond to embryonic cleavage and mostly contain progenitor cell states. The early stage largely corresponds to gastrulation; the middle stage corresponds to later rounds of cell division and lineage specification; and late stage consists mostly of terminally differentiating cells. Overall, we observed increased average mRNA stability over time; the median half-life of transcripts in late cells was ∼38% higher than in early cells, with middle cells having an intermediate range of half-lives (Fig. 4D; Supplemental Tables S4–S6).
Similarly, we compared global half-lives in broad cell classes (muscle, germline, epidermis, neuron, and pharynx) (Supplemental Tables S7–S12). The most notable difference observed was the overall longer half-lives in the germline compared with in somatic cell types (Fig. 4E; Supplemental Fig. S6A). Half-life distributions were significantly different between somatic cell types as well, but the magnitude of these differences was generally smaller (Fig. 4E; Supplemental Fig. S6A). Together, these observations suggest that mRNA turnover rates are globally regulated across both developmental stages and cell types.
Small RNA pathways such as the microRNA pathway are thought to target specific subsets of genes for destabilization and translational inhibition (Chekulaeva and Filipowicz 2009). Our cell type–specific mRNA half-life measurements also allowed us to examine the decay of known miRNA targets. C. elegans genes encoding subunits of the vacuolar adenosine triphosphatase (V-ATPase) complex, which controls intracellular and extracellular pH (Pamarthy et al. 2018), are ubiquitously expressed (Gutiérrez-Pérez et al. 2021). However, miR-1, a deeply conserved muscle-specific miRNA whose loss results in muscle cell defects, was found to repress multiple subunits of the V-ATPase complex in muscle cells (Gutiérrez-Pérez et al. 2021). We found that the half-lives of transcripts encoding such subunits (vha-8, vha-12, vha-14) are shorter in muscle cells compared with all other cell types, consistent with the model that miR-1 targets these transcripts for more rapid decay to adjust their levels specifically in muscle cells (Supplemental Fig. S7G; Gutiérrez-Pérez et al. 2021). Similarly, we examined the miR-35 and miR-51 families, which are the two miRNA families necessary for C. elegans embryonic development (Dexheimer et al. 2020). In each of the early, middle, and late stages, the predicted targets of these miRNA families had significantly shorter mRNA half-lives overall compared with all other transcripts (Supplemental Fig. S7D–F). Our data are thus consistent with the idea that the miR-35 and miR-51 families target transcripts for more rapid degradation.
Differential mRNA decay occurs throughout different developmental stages of C. elegans embryogenesis
Although transcripts on average increased in stability over time (Fig. 4D), we identified a subset of transcripts with faster decay at later stages of embryogenesis (Fig. 5A). These genes were enriched for terms such as structural constituent of chromatin, transporter activity, and cilium organization (Fig. 5B; Supplemental Fig. S8A–C). MEME identified several motifs differentially enriched in the 3′ UTR of these genes compared with the 3′ UTR of genes for which mRNA decay slows the most over time (Bailey et al. 2015). From early to middle stages and middle to late stages, motifs were identified that had potential similarities to motifs bound by the human factors HNRNPR, PCBP2, PABPC4, and HNRNPC (Supplemental Fig. S9D,E; Ray et al. 2013).
Differential mRNA decay occurs throughout different developmental stages of C. elegans embryogenesis. (A) Scatter plots comparing mRNA half-lives specific to early-, middle-, and late-stage cells in all pairwise comparisons. Each point represents a gene. Blue points correspond to the top 5% of genes with faster mRNA decay in the later stage compared with in the earlier stage. Pink points correspond to the top 5% of genes with slower mRNA decay in the later stage compared with in the earlier stage. Dashed line is the x = y line. (B) Select GO categories enriched among the top 5% of genes with faster mRNA decay in the later stage of embryogenesis compared with the earlier stage of embryogenesis. The background set of genes used in each comparison was shared genes we were able to calculate stage-specific mRNA half-lives for between the relevant stages. (C–F, left) Median scaled expression of gene subset using data from a whole-embryo RNA sequencing time series (Hashimshony et al. 2015). Blue shading spans the early stage; purple shading spans the middle stage; and pink shading spans the late stage. (Right) Plot displaying the change in mRNA half-lives from earlier to later stage for gene subset.
To illustrate how this resource can be used to characterize differential mRNA decay, we examined the dynamics of 70 previously identified core cilia component genes (Brocal-Ruiz et al. 2023). These genes generally peaked in expression at ∼400 min, around the end of our middle stage, and genes annotated with ciliary functions were enriched among genes with faster mRNA decay over time (Supplemental Fig. S9A). This peak corresponds well to the time when many sensory neurons generate their cilia (Nechipurenko and Sengupta 2017). Although the average transcript becomes more stable in late- versus middle-stage cells, the average cilia transcript becomes less stable or maintains a similar half-life during this period (Supplemental Fig. S9A). This pattern was especially pronounced for transcripts encoding ciliary transition zone (Williams et al. 2011) and intraflagellar transport proteins (Fig. 5C,D; Blacque et al. 2006; Ou et al. 2007). The ciliary transition zone is a domain at the base of cilia that controls the entry and exit of ciliary proteins needed for signal transduction (Li et al. 2016). On the other hand, intraflagellar transport mediates protein trafficking along a microtubular axoneme for the assembly and maintenance of cilia (Wang et al. 2021). Our observations suggest that these cilia components are produced at high levels during ciliogenesis, with transcripts destabilized once they are no longer needed at high levels.
Many putative transcription factor genes (Fuxman Bass et al. 2016) displayed changes in mRNA decay over time. We identified 16 transcription factor transcripts with decreasing stability in middle-stage cells and 14 with decreasing stability in late-stage cells compared with the prior stage. On average, these genes showed decreasing expression during the time when their transcript stability is lower (Fig. 5E,F). Thus, differential mRNA decay may allow transcripts of developmental regulators to degrade in a timely manner after accumulating to a certain threshold.
As the median scaled expression patterns of these ciliary and transcription factor genes have distinct peaks, differential mRNA degradation may contribute to their specific patterns of expression throughout embryogenesis. Indeed we found that, on average, zygotic-only expressed genes with faster mRNA decay over time have peaks in expression corresponding to the time when stability decreases (Supplemental Fig. S9B). Importantly, the expression patterns of genes with increased mRNA turnover over time were distinct from those observed for genes with decreased mRNA turnover over time and did not peak as strongly or in the same developmental stage (Supplemental Fig. S9C).
These results highlight that differential mRNA decay over time is associated with dynamic gene expression patterns throughout embryogenesis. This suggests that the regulation of mRNA degradation makes important contributions to developmental gene expression.
mRNA degradation may contribute to transcription factor dynamics at both the RNA and protein levels
Because transcription factors drive developmental gene expression, their mRNA dynamics are of particular interest in the context of fate specification. Across all developmental stages, transcripts encoding transcription factors (Fuxman Bass et al. 2016) decay more rapidly compared with those encoding other genes (Fig. 6A). This suggests that transcripts encoding transcription factors may need to achieve faster turnover on average throughout embryogenesis.
mRNA degradation may contribute to transcription factor dynamics at both the RNA and protein levels. (A) Box plots showing the mRNA half-life distributions of transcription factor genes and all other genes in early-, middle-, and late-stage cells. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, 198, five, 336, nine, 314, and nine genes within each group had mRNA half-lives >100 min. (B) Grid displaying the percentage of transcription factors with transient or persistent mRNA expression and transient or persistent protein expression. The probability that RNA and protein dynamics are independent of one another was calculated using a Fisher's exact test. (C) Box plots showing the single-cell RNA sequencing pseudobulk mRNA half-life distributions of transcription factors with transient mRNA and protein expression, transient mRNA and persistent protein expression, persistent mRNA and transient protein expression, and persistent mRNA and protein expression. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes within each group. P-values comparing median half-lives were calculated using the Wilcoxon rank-sum test. (D,E) Sublineages with coloring representing reporter GFP expression from a single-cell transcription factor protein expression atlas of the C. elegans embryo (Ma et al. 2021). Protein and mRNA dynamics are characterized below, along with pseudobulk measured mRNA half-life.
To test whether mRNA decay dynamics are associated with protein expression over time, we took advantage of an existing single-cell transcription factor protein expression atlas of the C. elegans embryo (Ma et al. 2021). This study used confocal microscopy for the live-imaging of strains expressing almost 300 transcription factor–GFP fusion proteins. Automated cell lineage tracing of these imaging data allowed for the quantification of protein expression at single-cell and ∼1 min temporal resolution. We categorized transcription factor proteins as either transiently or persistently expressed based on the occurrence of loss or maintenance of expression from parent-to-daughter cells. Similarly, we categorized transcription factors as transiently or persistently expressed at the mRNA level using a previous single-cell RNA sequencing data set (Packer et al. 2019). Overall, among the 91 transcription factors well measured in both the protein and mRNA data sets, there was a significant enrichment for concordant protein and RNA dynamics: 36.3% had both transient protein and mRNA expression, and 26.4% had both persistent protein and mRNA expression (P = 0.017) (Fig. 6B). A smaller number had discordant dynamics, including 12.1% with transient protein but persistent mRNA and 25.3% with persistent protein but transient mRNA (Supplemental Table S13). These discordant cases likely reflect distinct types of post-transcriptional regulation. Transient protein expression despite the persistent presence of mRNA would suggest that post-translational events are targeting proteins for rapid degradation or that their translation is limited to specific temporal stages. This could occur, for example, for proteins only needing to act for a short time window or if it were detrimental if they persist for too long. Potentially, the persisting mRNA could allow cells to rapidly create new protein in the future as needed. The reverse pattern, persistent protein despite transient mRNA dynamics, suggests that the encoded proteins are highly stable.
We hypothesized that transcription factors with concordant RNA and protein dynamics might reflect stronger regulation by RNA degradation. Consistent with this, transcription factors with transient protein and mRNA had shorter mRNA half-lives, on average, compared with transcription factors with persistent protein and mRNA (Fig. 6C). This suggests that transcript turnover may contribute to protein dynamics by influencing overall mRNA expression dynamics for these genes. For example, the transcription factors REF-2 and CEH-83 were both transiently expressed, with protein expression being lost across many lineages (Fig. 6D). The corresponding mRNAs were also transiently expressed (Supplemental Fig. S10A,B), with unstable mRNA half-lives of 19.7 and 22.0 min, respectively. The proteins MEP-1 and LSY-2, on the other hand, were both persistently expressed (Fig. 6E). The corresponding mRNAs were also persistently expressed (Supplemental Fig. S10C,D), but lsy-2 mRNA was much more stable, with a half-life of 62 min (vs. 28.6 min for mep-1).
These results suggest that mRNA degradation may contribute to the protein dynamics of developmental regulators. In many cases, the regulation of transcription and mRNA decay appear to act in concert to have a coherent impact on corresponding protein levels. However, it is not uncommon for RNA dynamics to differ from protein dynamics, highlighting the complexity associated with the regulation of protein expression.
mRNA stability is correlated with cell type–specific functions
As described above, different somatic cell types have more similar mRNA half-life distributions to one another than they do to germline cells (Fig. 4E). We asked what categories of genes have faster or slower mRNA decay within each cell class. We identified cell type–specific genes within each class and tested whether their stability differed from broadly expressed genes (see Methods). For muscle cells and neurons, cell type–specific transcripts had significantly longer half-lives compared with broadly expressed transcripts, whereas for the other somatic cell classes, specific and broadly expressed genes had similar stability distributions (Fig. 7A).
mRNA stability is correlated with cell type–specific functions. (A) Box plots showing the mRNA half-life distributions of cell type–specific and broadly expressed genes within muscle, germline, epidermis, neuron, and pharynx cells. Numbers to the left of the box plots are median half-lives within each group. Numbers above box plots are the number of genes within each group. The P-value comparing median half-lives was calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, 16, 125, 8, 28, one, 149, one, 69, five, and 120 genes within each group had mRNA half-lives >150 min. (B,i) Box plots showing the muscle-specific mRNA half-life distributions of mesoderm/muscle fate-specifying transcription factor genes, muscle structure genes, and all other genes. Numbers to the left of the box plots are median half-lives within each group. Numbers above box plots are the number of genes within each group. The P-value comparing median half-lives was calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, zero, four, and 137 genes within each group had mRNA half-lives >150 min. (ii,iii) Scatter plots of the normalized transcript abundance of the mesoderm/muscle fate-specifying transcription factor genes ceh-34, tbx-2, ceh-20, eya-1, and hlh-1 and the muscle structure genes deb-1, unc-78, unc-112, sgn-1, and tmd-2 throughout a 40 min transcription inhibition time course in muscle cells. Each point represents normalized transcript abundance from one of three biological replicates. (C,i) Box plots showing the neuron-specific mRNA half-life distributions of neural fate-specifying transcription factor genes, synapse-/dendrite-/axon-associated genes, and all other genes. Numbers to the left of the box plots are median half-lives within each group. Numbers above box plots are the number of genes within each group. The P-value comparing median half-lives was calculated using the Wilcoxon rank-sum test. Outliers are not shown: From left to right, zero, zero, and 70 genes within each group had mRNA half-lives >150 min. (ii,iii) Scatter plots of the normalized transcript abundance of the neural fate-specifying transcription factor genes ceh-5, egl-46, unc-86, ceh-43, and hlh-2 and the synapse-/dendrite-/axon-associated genes lnp-1, unc-37, mig-2, ced-10, and unc-53 throughout a 40 min transcription inhibition time course in neuronal cells. Each point represents normalized transcript abundance from one of three biological replicates. (D) Cell type–specific expression in the middle and late stages for genes with differential mRNA decay between muscle and neuron (i), epidermis and neuron (ii), and muscle and epidermis (iii). Cell type–specific expression in TPM was obtained from a C. elegans embryo single-cell atlas (Packer et al. 2019).
To determine what may drive these differences in half-life distributions, we examined the mRNA half-lives of genes that fell within specific functional categories in muscle and neuronal cells (Supplemental Fig. S12A,B). Within muscle cells, transcripts encoding transcription factors with roles in mesoderm or muscle fate specification (such as ceh-34, tbx-2, ceh-20, eya-1, and hlh-1) had significantly shorter half-lives overall compared with transcripts encoding muscle structural proteins (such as deb-1, unc-78, unc-112, sgn-1, tmd-2) (Fig. 7B). Similarly, in neurons, transcripts encoding neural fate transcription factors (such as ceh-5, egl-46, unc-86, ceh-43, and hlh-2) had significantly shorter half-lives compared with transcripts associated with synapses, dendrites, or axons (such as lnp-1, unc-37, mig-2, ced-10, and unc-53) (Fig. 7C). Although the overall mRNA half-life distributions for cell type–specific and broadly expressed genes were similar within epidermis and pharynx, in these cell classes, we similarly saw faster decay of cell type–enriched transcription factor transcripts and slower decay of structural transcripts such as those encoding cuticle-associated proteins in the epidermis and peptidase inhibitors in the pharynx (Supplemental Figs. S11A–F, S12C,D).
We used MEME to identify motifs associated with slow or fast mRNA decay within neurons and muscle (Supplemental Fig. S13A–D; Bailey et al. 2015). Similar motifs were identified for the most stable and unstable transcripts in the soma as a whole (Supplemental Fig. S13E,F), suggesting that these motifs may not be unique to muscle and neurons. Although the motifs found in stable transcripts included those that resemble PCBP and HNRNP family binding sites (Supplemental Fig. S13A,C,E), the CSGGU motif associated with unstable transcripts had no significant matches when examined against a database of known RNA-binding protein motifs (Ray et al. 2013).
To begin to examine how extensive differential mRNA decay may be across somatic cell types, we focused on comparing half-lives between muscle, epidermis, and neuronal cells. We focused on these cell types as they were the most abundant in our data set, allowing us to accurately measure half-lives for more genes. We focused on the middle stage to avoid differences in half-life owing to tissue-specific differences in expression timing, with the middle stage having a large number of cells. Differential decay between cell types during that stage could conceivably contribute to both fate specification and terminal differentiation.
Comparing genes with half-lives measured in muscle and epidermis, ∼3% (58/1926) had half-lives at least 1.5-fold longer in the muscle and ∼6% (115/1926) in the epidermis. Between muscle and neurons, ∼18% of genes (442/2455) had mRNA half-lives at least 1.5-fold longer in muscle, and ∼1.2% of genes (30/2455) had mRNA half-lives at least 1.5-fold longer in neurons. Lastly, between epidermis and neurons, ∼21.3% of genes (539/2531) had mRNA half-lives at least 1.5-fold longer in epidermis, and ∼0.9% of genes (23/2531) had mRNA half-lives at least 1.5-fold longer in neurons. Although mRNA half-lives of shared genes were overall shorter in neurons compared with in muscle and epidermis, the impact of differences in half-lives for individual transcripts across the different cell types is unclear. The expression of many of these genes in different cell types, however, may be shaped in part by differences in mRNA decay. For example, zig-8 transcripts were more stable in neurons than in muscle, with half-lives of 37.5 and 24.3 min, respectively; jac-1 transcripts were more stable in neurons than in epidermis, with half-lives of 37.6 and 23.5 min, respectively; and vha-20 transcripts were more stable in epidermis than in muscle, with half-lives of 92.2 and 60.8 min, respectively. The expression of these genes in our existing C. elegans embryo single-cell atlas (Packer et al. 2019) highlights that they are well expressed in both cell types of interest; furthermore, gene expression in the cell type with a shorter mRNA half-life decreases in later embryogenesis whereas gene expression in the cell type with a longer mRNA half-life tends to persist or increase, suggesting a role of regulated mRNA decay in cell type–specific persistence (Fig. 7D).
Differential mRNA decay occurs across germline and somatic cells in the C. elegans embryo
The most striking cell type–specific difference in mRNA degradation was between the germline (median t1/2 = 62 min) and soma (median t1/2 = 36 min) (Figs. 4E, 8A). When comparing mRNA half-lives of shared genes between these cell classes, ∼80% of transcripts were more stable in the germline than they were in the soma (Fig. 8B).
Differential mRNA decay occurs across germline and somatic cells in the C. elegans embryo. (A) Box plots showing the cell type–specific mRNA half-life distributions of genes across the germline and soma. (B) Scatter plot comparing soma- and germline-specific mRNA half-lives to one another. Each point represents a gene. Pink points correspond to the top 10% of genes with longer mRNA decay in the germline compared with in the soma. Blue points correspond to the top 10% of genes with faster mRNA decay in the germline compared with in the soma. Dashed line is the x = y line. (C) A cartoon of box plots representing the expected mRNA half-life distribution of genes with less, constant, or greater expression over time if mRNA decay primarily contributes to gene expression. (D) A cartoon of box plots representing the expected mRNA half-life distribution of genes with less, constant, or greater expression over time if transcription primarily contributes to gene expression. (E) Box plots showing the germline-specific mRNA half-life distributions for genes that decrease or increase in expression over time in the germline from a fold-change of one to two or greater than two. (F) Box plots showing the soma-specific mRNA half-life distributions for genes that decrease or increase in expression over time in the soma from a fold-change of one to two or greater than two. (G) Box plots showing the germline-specific mRNA half-life distributions of mRNA 3′ UTR-binding genes, other mRNA-binding genes, and genes not annotated as RNA binding. (H) Box plots showing the soma-specific mRNA half-life distributions of mRNA 3′ UTR-binding genes, other mRNA-binding genes, and genes not annotated as RNA binding. (I) Plot displaying the soma- and germline-specific mRNA half-lives for mRNA-binding genes whose decay is more rapid in the germline than in the soma. Numbers to the left of the box plots are median half-lives within each group. Numbers above the box plots are the number of genes with half-lives >150 min (A,G,H) or 325 min (E,F) within each group. The P-value comparing median half-lives was calculated using the Wilcoxon rank-sum test.
In the C. elegans embryo, the germline is derived from the P4 blastomere, which is born at the 24-cell stage after a series of asymmetric cell divisions (Wang and Seydoux 2013). At the 88-cell stage, P4 divides into the primordial germ cells Z2 and Z3, which do not divide further throughout embryogenesis. P4, Z2, and Z3 are thought to be largely transcriptionally quiescent (Wang and Seydoux 2013). Despite this, our C. elegans embryo single-cell atlas (Packer et al. 2019) identified substantial quantitative differences in gene expression between early, mid-embryo, and late-germline cells. This raises the question of whether differences in mRNA stability govern the quantitative maturation of the germline transcriptome during embryogenesis.
To test this, we compared mRNA stability between transcripts that increase or decrease in relative abundance across time in the germline. If expression differences result from differential mRNA decay, we would expect genes with decreasing relative abundance to have shorter mRNA half-lives compared with genes with constant expression (Fig. 8C). Similarly, we would expect genes with increasing relative abundance to have longer mRNA half-lives. On the other hand, if changes in relative abundance result from previously unrecognized transcription, we might expect comparable half-life distributions for each group (Fig. 8D).
We found that changes in germline mRNA abundance over time correlated with mRNA stability. Transcripts that decrease in abundance over time had shorter half-lives overall, and transcripts that increase in abundance over time had longer half-lives overall (Fig. 8E). This effect was stronger than the increase in mRNA half-lives we saw for genes with greater soma-specific expression over time (Fig. 8F). These results are consistent with a model in which changes in mRNA levels throughout embryogenesis result from mRNA stability differences in the germline and from a mix of transcriptional and post-transcriptional mechanisms in the soma.
The changes observed in transcript abundance in the germline raise the question of what types of genes increase or decrease over time. Gene Ontology analysis revealed that germline-specific genes were enriched for genes associated with mRNA 3′ UTR-binding (Supplemental Fig. S12E). We found that genes with this annotation had significantly shorter mRNA half-lives in the germline compared with those of mRNA-binding genes not associated with 3′ UTRs or genes not annotated as RNA binding (Fig. 8G). This was not the case in the soma (Fig. 8H), suggesting that germline maturation might be specifically regulated by decay of transcripts encoding RNA-binding proteins. Transcripts encoding mRNA-binding proteins with faster decay in the germline than in the soma included mex-5 and mex-6 transcripts, for which half-lives in the soma were more than twice as long as they were in the germline (Fig. 8I). MEX-5 and MEX-6 are nearly identical proteins that inhibit the translation of some germline proteins in the soma, and they become more highly expressed in somatic cells versus germline cells over time in the early embryo (Schubert et al. 2000). Our results suggest that maturation of the germline in part results from germline-specific turnover of maternally provided somatic regulators like mex-5 and mex-6.
Gene Ontology analysis further found an enrichment among the most stable transcripts in the germline for those encoding proteins with functions related to peptide biosynthetic process (Supplemental Fig. S14A). Although transcripts in the germline encoding proteins involved in translational elongation and the positive regulation of translation were significantly more stable compared with all other transcripts, the same general trend was observed in neuron, muscle, epidermis, and pharynx cells (Supplemental Fig. S14B–F). Thus, it is unclear whether such transcripts play specific roles in promoting the germline fate during postembryonic development or are just generally stable across all cell types. To characterize transcripts preferentially stable in the germline, we identified genes with mRNA half-lives that were more than twofold longer in the germline compared with in the soma. These genes were largely maternally expressed, as 148/176 genes were detected in one-cell embryos in a C. elegans whole-embryo RNA sequencing time series (threshold of TPM > 5) (Hashimshony et al. 2015). The expression over time of genes whose transcripts were preferentially stable in the germline was generally more persistent in the germline compared with in the soma. This is consistent with an important role for the differential stability of such transcripts in determining the late-embryonic germline transcriptome (Supplemental Fig. S14G,H).
Discussion
Using a transcription inhibition approach and RNA sequencing, we measured mRNA half-lives transcriptome-wide throughout C. elegans embryogenesis. We found that global mRNA decay rates differ between cell types and developmental stages, identified gene attributes and sequence motifs associated with different rates of decay, and showed that regulated RNA turnover is associated with dynamic gene expression patterns. Our findings emphasize that developmental gene expression patterns result from regulation of both transcription and RNA decay.
A major advantage to the transcription inhibition approach is that it can be readily applied to any RNA sequencing platform, and the downstream analyses to calculate mRNA half-lives transcriptome-wide are relatively straightforward in a dynamic situation like early embryogenesis. In addition, general concordance in measured mRNA decay rates has been observed between transcription inhibition approaches and approaches that maintain transcription, such as metabolic labeling, within mammalian cells and fly embryos (Burow et al. 2015; Herzog et al. 2017). A limitation to our approach is that transcription inhibition is best applied to primary cell cultures, as the chitin eggshell of the C. elegans embryo is impermeable to drugs like actD. Thus, our sequencing-based mRNA quantification cannot be readily applied to or validated by alternative approaches that would be best applied to whole embryos, such as fluorescence in situ hybridization. However previous work has shown that in C. elegans embryos, transcript abundance measured by single-molecule RNA FISH generally agrees well with that determined by single-cell RNA sequencing (Raj et al. 2010; Nair et al. 2013; Tintori et al. 2016; Sivaramakrishnan et al. 2023).
Our analyses suggest that mRNA half-lives for most genes in the C. elegans embryo range from ∼20 to 200 min. Previous studies in cell culture systems found that the median half-life is roughly proportional to cell cycle length (Vejnar et al. 2019). A study of decay in early Drosophila melanogaster embryos (nc11 through cephalic furrow formation), which have comparable cell cycle lengths (∼10–180 min) to those in the C. elegans embryo (Sulston et al. 1983; Bao et al. 2008), observed a similar range of half-lives to ours for about 260 zygotic transcripts (Beadle et al. 2023). In older fly embryos (stage 12 to 15), which have longer cell cycles, the median half-life was longer at 73 min (Burow et al. 2015). This was longer than our C. elegans median half-life of 54 min, perhaps reflecting that older fly embryos no longer have the short mitotic cycles and rapid development characteristic of early fly embryos. We similarly found that mRNA half-lives increase over time during C. elegans embryogenesis. It is important to emphasize that some genes go against this trend, however, and have faster mRNA decay at later embryonic stages. The fact that these genes are enriched for specific categories of genes, such as transcription factor genes or genes involved in building the cilium, suggests that the rate of RNA turnover across time is heavily regulated.
Although these changes are consistent with a relationship between the cell cycle and RNA turnover, they could also reflect the need for progenitor cells to degrade mRNAs in a timely manner to allow cell differentiation to proceed appropriately, as has been observed in other systems (Soustelle et al. 2008; Abbadi et al. 2019). Consistent with this, we observed that mRNAs encoding transcription factors had faster mRNA degradation compared with other transcripts, and fast RNA turnover was significantly associated with transient transcription factor protein expression. These results highlight that mRNA degradation likely plays a key role in modulating the protein expression of developmental regulators.
We found that specific functional categories of genes had distinct mRNA decay dynamics in different cell types. For example, in neurons, transcripts encoding neural fate-specifying transcription factors had relatively short half-lives, whereas those associated with synapses, dendrites, or axons had relatively long half-lives. Similar findings were observed for neuron-specific mRNA half-lives in the fly embryo (Burow et al. 2015). Perhaps the unique architecture of neuronal cells can help explain these differences in the regulation of mRNA degradation. Neurons are known to localize some transcripts distant from the cell body such as at synapses; their stabilization may facilitate regulated local translation. We also found that transcripts vary substantially in their germline mRNA stability, and this is correlated with changes in germline expression over time. This suggests that the regulation of mRNA degradation plays an important role in shaping the germline transcriptome. The relatively rapid decay of transcripts encoding mRNA 3′ UTR-binding proteins in the germline further points to the importance of post-transcriptional regulation in this cell type.
From yeast to human cells, greater translational efficiency is associated with greater transcript stability (Presnyak et al. 2015; Wu et al. 2019). Our observation that stable transcripts had significantly higher optimal codon usage compared with unstable transcripts in C. elegans embryos is consistent with this. We also found that stable transcripts tend to have more introns than unstable transcripts, in line with the presence of introns having a stabilizing effect in mammals and plants (Shaul 2017) and increasing protein production in C. elegans (Crane et al. 2019). The relationship between 3′ UTR length and mRNA stability appears to differ depending on the context. In zebrafish, longer 3′ UTRs conferred resistance to codon-mediated deadenylation for maternal transcripts (Mishima and Tomari 2016). Our data align with this finding, as stable transcripts had significantly longer 3′ UTRs compared with unstable transcripts. Future studies examining the effect of altering such features will test the potentially combinatorial causal relationships between these features and mRNA stability.
Examining the mRNA half-lives of genes that accumulate to high transcript levels revealed that such genes undergo overall slower transcript turnover compared with genes that accumulate to low transcript levels. Some genes accumulate so rapidly in early embryogenesis that their transcription rates must be high, irrespective of their mRNA turnover rates (Sivaramakrishnan et al. 2023). This emphasizes that the regulation of both transcription and mRNA degradation can work together to achieve high gene expression. The 3′ UTRs of genes with high transcript accumulation were significantly enriched for several motifs, including poly(C) motifs. In mammals, PCBPs have been implicated in the stabilization of transcripts (Makeyev and Liebhaber 2002), and transcripts that contain a poly(C) motif may be stabilized in C. elegans oocytes (Stoeckius et al. 2014). Future studies will establish whether PCBP homologs act as transcript stabilizers in the C. elegans embryo.
Differential isoform usage, particularly among 3′ UTR isoforms, could be another potential contributor to differential mRNA decay of a gene across developmental stages or cell types. In C. elegans, hundreds of genes have been identified to change whether the proximal or distal 3′ UTR isoform is more dominantly expressed throughout various stages of embryogenesis (Levin et al. 2020). Alternative polyadenylation has also been probed across the intestine, muscles, neurons, arcade and intestinal valve cells, seam cells, and hypodermal tissues in mixed-stage cultures of C. elegans, with an average of 31% of genes found to express 3′ UTR isoforms in a tissue-specific manner (Blazie et al. 2015, 2017). Technical obstacles, such as incomplete isoform annotations and ambiguous mapping of RNA sequencing reads, currently make the measurement of isoform-specific half-life measurements difficult; overcoming these challenges to add additional context to mRNA decay rates would be an intriguing future direction.
Methods
Embryonic cell isolation and culturing
C. elegans adults (N2 strain) were grown on large, enriched peptone plates seeded with NA22 bacteria. Mixed-stage embryos were released from adult worms using hypochlorite treatment followed by two washes with previously described complete L-15 cell culture media (Bianchi and Driscoll 2006). To obtain cell suspensions, embryos were treated with 0.5 mg/mL chitinase in egg buffer on ice until the eggshells were dissolved (∼10 min). Then embryonic cells were dissociated using a 3 mL syringe fitted with 21.5 gauge needle until >80% of embryos were disrupted. The cell suspension was passed through a 10 μM filter before being washed and then resuspended in complete L-15.
Bulk RNA sequencing transcription inhibition time course
Following embryonic cell isolation, cells were treated with trypan blue and counted using a hemocytometer. Cell cultures at a concentration of ∼1 million cells/mL were treated with 2 μg/mL of the transcription inhibitor actD over a time course of 0, 10, 20, 40, and 60 min at 20°C. Each time point had 2 mL of cell suspension (about 2 million cells). After each time point, RNA was isolated from cells using QIAzol that contained a 1:350,000 dilution of the ERCC spike-in kit (Baker et al. 2005). Libraries were prepared using the Illumina Ribo-Zero Plus rRNA Depletion Kit and sequenced on an Illumina NextSeq 500.
Bulk RNA sequencing computational analysis
The RNA sequencing data were processed using the spliced transcripts alignment to a reference (STAR) alignment tool (Dobin et al. 2013) against the WormBase WS277 reference transcriptome with 3′ UTR extensions as previously carried out (Packer et al. 2019). VERSE was used to get read counts for genes (Zhu et al. 2016). Genes had to have a count greater than 30 at the 0 min time point across all biological replicates for their expression data to be used in downstream analyses. For half-life comparisons between biological replicates, only well-measured genes were examined. These were genes with a count greater than 30 at the 0 min time point whose expression throughout the transcription inhibition time course fit an exponential decay model R2 ≥ 0.75 within each replicate.
Single-cell RNA sequencing transcription inhibition time course
Cell cultures at a concentration of ∼1 million cells/mL were treated with 2 μg/mL of the transcription inhibitor actD over a time course of 0, 20, and 40 min at 20°C. Each time point had 2 mL of cell suspension (about 2 million cells). After each time point, cells were immediately put on ice and kept on ice, until the remaining time points were collected, and processed on ice in all subsequent steps until 10x collection. At the end of the 40 min time course, cells were washed with complete L-15, washed with 1× PBS with 0.4% BSA, and then resuspended in 1× PBS with 0.4% BSA. Single-cell capture and library preparation followed the 10x Genomics published protocols for the Chromium Next GEM Single Cell 3′ reagent kit v3.1, with the objective of recovering around 10,000 cells for each time point. Libraries were sequenced on an Illumina NextSeq 500.
Single-cell RNA sequencing computational analysis
The data were processed with the 10x Genomics CellRanger pipeline, aligning reads to the WormBase WS277 reference transcriptome with 3′ UTR extensions as previously carried out (Packer et al. 2019). Ambient RNA contamination was removed using the SoupX algorithm (Young and Behjati 2020) with default settings. The data were then visualized using dimensionality reduction methods for either merged data sets within each biological replicate or an integrated data set of all biological replicates. Single-cell transcriptomes were first projected into 150 dimensions using PCA and then projected into two dimensions using the UMAP algorithm. For each cell, the age of the embryo from which it came from was estimated by correlating its transcriptome with a whole-embryo bulk RNA-seq time series as previously described (Hashimshony et al. 2015). Cells were then automatically annotated with their corresponding cell type or manually annotated based on cell type–specific marker genes reported in WormBase (Lee et al. 2018). Automated annotation was done using Seurat (Stuart et al. 2019) to project the PCA structure of a reference, an existing C. elegans embryo single-cell atlas (Packer et al. 2019), to our single-cell data. Epidermal cells were manually annotated to better include progenitor cells that the reference data set did not include. Within the cell types we examined, doublets were manually removed. This was done by identifying clusters of doublets in iterated UMAP projections of the data on the basis of coexpression of cell type–specific marker genes. Genes with a UMI count greater than 30 at the 0 min time point combined across all biological replicates were used in downstream analyses. For half-life comparisons between biological replicates, only well-measured genes were examined. These were genes with a UMI greater than 30 at the 0 min time point whose expression throughout the transcription inhibition time course fit an exponential decay model R2 ≥ 0.75 within each replicate.
mRNA half-life calculations
For the bulk RNA sequencing data, gene counts were normalized to the sum of counts corresponding to ERCC RNAs. Normalized gene counts were then fit to an exponential decay model (A(t) = A0e−kt) using nonlinear least squares regression in R (R Core Team 2021), allowing a decay rate constant (kdecay) to be calculated for each gene. A half-life was then calculated for each gene using the following equation: t1/2 = ln(2)/kdecay. The 95% confidence intervals for kdecay were determined using the R package nlstools (Baty et al. 2015). For the single-cell RNA sequencing data, half-lives were calculated in a similar manner with the number of UMIs per gene. Transcript abundance was normalized to the sum of UMIs corresponding to ribosomal protein genes, because their transcripts were highly stable in our spike-in control-normalized bulk data (median t1/2 = 295.5 min). This normalized expression was further adjusted by a correction factor to account for the decay of ribosomal protein genes over time (0.5time point/295.5 min).
A combination of coefficient of variation and confidence intervals was used to filter out poorly measured half-lives. Gene half-lives were included in downstream analyses if their coefficient of variation (standard deviation/mean × 100) across biological replicates was ≤50% or the fold-change between the upper limit of its 95% confidence interval and measured half-life was three or less. To better include high-stability mRNAs, genes with half-lives >100 min were allowed a looser filtering strategy. Such genes were kept if their half-lives had a coefficient of variation ≤75% or the fold-change between the upper limit of its 95% confidence interval and measured half-life was four or less.
Gene Ontology analysis
Gene Ontology analysis was performed using the WormBase enrichment analysis tool (Angeles-Albores et al. 2018) against relevant background sets of genes. For example, for the bulk RNA sequencing data examining the top 15% slow- and fast-decaying genes, the background set of genes used was all the genes that met our moderate half-life filtering metric.
Transcript sequence feature analysis
3′ UTRs across the C. elegans transcriptome were retrieved from the WormBase ParaSite database (Howe et al. 2017) on March 9, 2023. Only 3′ UTRs with ≥8 nucleotides (nt) were used for downstream analyses. To examine the percentage of GC content, the longest 3′ UTR was used for genes with multiple isoforms. To examine the correlation between 3′ UTR length and half-life, the mean UTR length across isoforms for each gene was calculated.
Coding sequences across the C. elegans transcriptome were retrieved from the WormBase ParaSite database (Howe et al. 2017) on March 4, 2023. To examine the percentage of GC content, the longest coding sequence was used for genes with multiple isoforms. To examine the correlation between coding sequence length and half-life, the mean coding sequence length across isoforms for each gene was calculated. C. elegans–specific tRNA adaptation index (tAI) weights were used from stAIcalc (Sabi et al. 2017; http://tau-tai.azurewebsites.net/) to calculate a tAI value for each gene, with the longest coding sequence being used for genes with multiple isoforms. The tAI value for each gene was the geometric mean of tAI weights for all codons in the coding sequence (Yoon et al. 2018).
Introns were identified using the TxDb.Celegans.UCSC.ce11.refGene annotation package (https://bioconductor.org/packages/TxDb.Hsapiens.UCSC.hg38.knownGene/) for TxDb objects. To examine the correlation between the number of introns and half-life, the mean number of introns across isoforms for each gene was calculated.
The “regsubsets” function from the R package “leaps” (https://cran.r-project.org/web/packages/leaps/index.html) was used for linear multiple regression analysis. The mRNA half-life was the modeled variable, and the sequence features of interest (tAI, mean intron, mean 3′ UTR length, mean coding sequence length, GC content in 3′ UTR, GC content in coding sequence) the predictors. The individual contribution of each sequence feature to variation in half-lives was evaluated by having each feature as the only predictor in the model.
Motif analysis
MEME differential enrichment was used for motif analysis (https://meme-suite.org/meme/tools/meme). The minimum width for motifs was set to five, and the maximum width for motifs was set to 15. For motif analysis in 3′ UTRs, the longest 3′ UTR was used for genes with multiple isoforms. To identify motifs in the 3′ UTRs of the top 15% slow-decaying genes in our bulk data, the 3′ UTRs of the top 15% fast-decaying genes were used as control sequences (and vice versa).
Query motifs were compared with a database of RNA-binding protein motifs, which were identified from in vitro assays (Ray et al. 2013). The Tomtom motif comparison tool was used to search each query motif against this database of target motifs, with an alignment produced for each significant match (Gupta et al. 2007). The E-value compensates for multiple testing by multiplying each P-value by twice the number of target motifs, and the default Tomtom significance threshold for the E-value is less than 10, although we only highlighted those with an E-value less than three. The Pearson correlation coefficient was used for the motif comparison function. For more details on the workings of the motif comparison tool, see the Tomtom manual (https://meme-suite.org/meme/doc/tomtom.html?man_type=web).
Determining transient or persistent gene expression
We identified genes with highly transient, transient, persistent, or highly persistent expression over time using our previously published C. elegans embryo single-cell atlas (Packer et al. 2019). Highly transient genes were defined as genes whose expression was lost from parent-to-daughter cell in 30 or more comparisons or ≥30% of comparisons. Transient genes were defined as genes whose expression was lost from parent-to-daughter cell in 10 or more comparisons or ≥10% of comparisons, excluding highly transient genes. Highly persistent genes were defined as genes whose expression was maintained from parent-to-daughter cell in 30 or more comparisons or ≥30% of comparisons. Furthermore, these genes had to have one or fewer comparisons and ≤1% of comparisons with a loss in expression from parent to daughter. Persistent genes were defined as genes whose expression was maintained from parent-to-daughter cell in 10 or more comparisons or ≥10% of comparisons, excluding highly persistent genes. Furthermore, these genes had to have five or fewer comparisons and ≤5% of comparisons with a loss in expression from parent to daughter.
Determining transient or persistent protein expression for transcription factors
To characterize transcription factor protein expression dynamics, we used expression data provided from a single-cell transcription factor protein expression atlas of the C. elegans embryo (Ma et al. 2021). A transcription factor was considered expressed in a cell if its expression was >1.5 × IQR below the first quartile for protein expression across all cells. Transient proteins were defined as proteins whose expression was lost from parent-to-daughter cell in 10 or more comparisons or ≥10% of comparisons. Persistent proteins were defined as proteins whose expression was maintained from parent-to-daughter cell in 10 or more comparisons or ≥10% of comparisons. Proteins with data from more than one reporter strain were only characterized as transient or persistent if the strains agreed with one another.
Hierarchical clustering of gene expression patterns
Gene expression was first scaled using the “scale” function in R (R Core Team 2021), and then a distance metric was calculated between each pair of genes using the “dist” function. Hierarchical clustering was performed on these distance values using the “hclust” function. The cluster number was chosen based on visualization with a dendrogram.
Determining genes with high transcript accumulation
To characterize genes in the staged embryo RNA sequencing data (Hashimshony et al. 2015) whose expression peaks ∼200 min past the four-cell embryo stage as having high, medium, or low transcript accumulation, we calculated a slope value for each gene. This slope = (maximum expression from embryo stages 10 to 200 min − minimum expression from embryo stages 10 to 200 min)/maximum stage in minutes − minimum stage in minutes. Slopes from zero to the 0.2 quantile were considered low-accumulating genes, slopes from the 0.4 to 0.6 quantile were considered medium-accumulating genes, and slopes from the 0.8 to 1.0 quantile were considered high-accumulating genes. A similar analysis was done for genes whose expression peaks ∼350 min past the four-cell embryo stage.
Determining cell type–specific and broadly expressed genes
Data from our C. elegans embryo single-cell atlas (Packer et al. 2019) were used to determine gene expression within muscle, germline, epidermis, neuron, and pharynx cells in TPM. A gene was considered to be specific to a given cell type if it was expressed more than twofold greater in that cell type compared with all other cell types. For example, a gene was considered muscle specific if it was expressed more than twofold greater in muscle than it was in germline, epidermis, neuron, and pharynx. All other genes were considered broadly expressed. Ten TPM was added to each gene to avoid issues in comparing unexpressed or poorly measured genes.
Examining gene-specific differential mRNA decay between cell types
To examine mRNA half-lives between muscle, epidermis, and neuron, we used middle-stage-specific half-lives. To avoid comparing high-stability transcripts that could have artificially large differences in half-lives between cell types, we only considered genes with an mRNA half-life <75 min in at least one cell type. A gene was considered to have differential mRNA decay between cell types if it had an mRNA half-life at least 1.5-fold longer in one cell type compared with the other.
Defining maternal and zygotic-only genes
Zygotic-only genes were determined using a whole-embryo time series RNA sequencing data set of the C. elegans embryo (Hashimshony et al. 2015). Genes with TPM > 5 within one-cell embryos were considered to be maternally expressed genes. All other genes were considered to be zygotic-only genes.
miRNA analyses
Predicted mRNA targets of the miR-35 and miR-51 families were identified from TargetScanWorm, release 6.2 (https://www.targetscan.org/worm_52/). Half-lives for these predicted targets were calculated separately for early-, middle-, and late-stage cells. Half-lives for the V-ATPase complex transcripts were calculated from middle- and late-stage cells.
Data access
All RNA sequencing data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject) under accession number PRJNA1061171. All R codes regarding data visualization and analyses are available at GitHub (https://github.com/fe-peng/celegans_embryo_mRNA_decay) and as Supplemental Code. A VisCello object for visualization of the three single-cell RNA sequencing biological replicates in an integrated data set is available at Zenodo (https://doi.org/10.5281/zenodo.10499334).
Competing interest statement
The authors declare no competing interests.
Acknowledgments
We thank members of the Murray laboratory, the Penn Worm Group, Naveen Jain (University of Pennsylvania, Arjun Raj laboratory), Calvin Huang (University of California, Davis, Bo Liu laboratory), and Chenxin Li (University of Georgia, Robin Buell laboratory) for providing valuable discussion and comments on the manuscript. We also thank Jean Rosario and Catherine Lucey (University of Pennsylvania, Junhyong Kim laboratory) for their help in sequencing RNA libraries. This work was funded by National Institute of General Medical Sciences (NIGMS) grant R35GM127093, Eunice Kennedy Shriver National Institute of Child Health and Human Development grant F31HD10785, and NIGMS grant T32GM008216.
Author contributions: F.P. and J.I.M. conceived and designed the study; F.P. performed the experiments; C.E.N. created new genome and transcriptome references and processed the bulk and single-cell RNA sequencing data; F.P. performed analyses and visualization; J.I.M. supervised the analyses; F.P. wrote the original draft; and F.P., C.E.N., and J.I.M. reviewed and edited the manuscript.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278980.124.
- Received January 17, 2024.
- Accepted August 8, 2024.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.



















