Antagonistic regulation of mRNA expression and splicing by CELF and MBNL proteins

RNA binding proteins of the conserved CUGBP1, Elav-like factor (CELF) family contribute to heart and skeletal muscle development and are implicated in myotonic dystrophy (DM). To understand their genome-wide functions, we analyzed the transcriptome dynamics following induction of CELF1 or CELF2 in adult mouse heart and of CELF1 in muscle by RNA-seq, complemented by crosslinking/immunoprecipitation-sequencing (CLIP-seq) analysis of mouse cells and tissues to distinguish direct from indirect regulatory targets. We identified hundreds of mRNAs bound in their 3′ UTRs by both CELF1 and the developmentally induced MBNL1 protein, a threefold greater overlap in target messages than expected, including messages involved in development and cell differentiation. The extent of 3′ UTR binding by CELF1 and MBNL1 predicted the degree of mRNA repression or stabilization, respectively, following CELF1 induction. However, CELF1's RNA binding specificity in vitro was not detectably altered by coincubation with recombinant MBNL1. These findings support a model in which CELF and MBNL proteins bind independently to mRNAs but functionally compete to specify down-regulation or localization/stabilization, respectively, of hundreds of mRNA targets. Expression of many alternative 3′ UTR isoforms was altered following CELF1 induction, with 3′ UTR binding associated with down-regulation of isoforms and genes. The splicing of hundreds of alternative exons was oppositely regulated by these proteins, confirming an additional layer of regulatory antagonism previously observed in a handful of cases. The regulatory relationships between CELFs and MBNLs in control of both mRNA abundance and splicing appear to have evolved to enhance developmental transitions in major classes of heart and muscle genes.


RNA-seq and CLIP-seq read mapping, gene expression estimation
All read mapping was performed using Bowtie/Tophat (Langmead et al. 2009;Trapnell et al. 2009), mapping to mm9. Only uniquely mapping reads were used. CLIP reads were collapsed to remove identical sequences, adapter sequences were removed, and processed reads were then mapped separately for each CLIP read length. To estimate gene expression levels, the number of reads mapping to each kilobase of constitutive coding sequence of RefSeq/Locuslink genes was counted, and divided by the number of reads (in millions) mapping uniquely to non-ribosomal and non-mitochondrial sequence,

Estimation of isoform frequencies, calculation of MZ score
MISO (version 0.4.8) was used to estimate isoform frequencies for splicing events and alternative 3' UTR events, using a minimum of 20 reads per event and parameters of burn_in=500, lag=10, num_iters=5000, and num_chains=6. To identify splicing events that change monotonically over time, we ordered samples chronologically, and for each event compared all pairs of samples from different time points, tallying the number of comparisons representing significant increase or significant decrease in Ψ (at BF > 5).
We calculated a quantity called δ, the number of significant positive ΔΨ values (increases over time) minus the number of significant negative ΔΨ values. To assess statistical significance, we recalculated δ after randomly permuting the sample labels.
Repeating this process 100 times, we generated a null distribution, and derived the 3 "monotonicity Z-score" (MZ = (δ -µ) / σ), where µ and σ are the mean and standard deviation of the null distribution, respectively.

Analysis of antagonistically regulated splicing events (Figs. 1G, 1H)
Splicing events regulated in response to each perturbation (CELF over-expression, Mbnl1 KO, or heart development) were enumerated, and the number of events regulated among each pair of perturbations was counted (this was the "observed" overlap). To compute the "expected" overlap, we assumed independence, e.g. the fraction of events regulated in both perturbations equals the fraction of events regulated in the first perturbation multiplied by the fraction of events regulated in the second perturbation. Significance of the bias in direction of regulation ( Figure 1h) was assessed by binomial test, assuming a null hypothesis frequency of 0.5.

Correlation of CLIP binding density across the transcriptome (Fig. 2B)
CLIP read density was computed in 5 nt windows across 3' UTRs in the transcriptome for genes highly expressed in whole brain, heart, muscle, and C2C12 mouse myoblasts (>100 RPKM) exhibiting high CLIP coverage (>100 tags per UTR). Pearson correlation coefficients were computed for these densities between CLIP libraries performed in different samples and for different proteins.

Identification of CLIP clusters
CLIP tags were first collapsed to remove redundant sequences, and trimmed of adapters. These sequences were mapped to genome and a database of splice junctions using Bowtie. To identify CLIP clusters lying within genic regions, gene boundaries were first defined using RefSeq, Ensembl, and UCSC tables. For each window of 30 nucleotides covered by at least one CLIP tag, a test was performed to assess CLIP density in the window exceeded that which is predicted by a simple Poisson model which accounts for gene expression and pre-mRNA length (Yeo et al. 2009).
CLIP cluster motif analysis (Fig. 2C, Fig. S3B) Pentamers occurring in CELF1 CLIP clusters from heart were counted and compared to those found in randomly selected clusters within the same 3' UTRs ( Fig. 2C) or whole genes (Fig. S3B). This procedure was repeated 100 times to derive a Z-score for each motif, where the Z-score was defined as the number of standard deviations away from the mean. The 5 most highly enriched 5mers are highlighted.

Conservation analysis for cassette exons and 3' UTRs (Figs. 2D, 3B)
Conservation (30-way phastCons, UCSC genome browser) of cassette exons with CLIP clusters within 1 kilobase of each splice site or within the exon itself was compared to a control set of cassette exons found in similarly expressed genes. Conservation (30-way phastCons) of 3' UTRs with CLIP clusters less than 1 kilobase upstream and less than 500 bases downstream of constitutive transcript ends was compared to a control set of 3' UTRs with similar length, in similarly expressed genes. For these analyses, CLIP clusters and gene expression values from heart were used.

Celf1 induction in heart
Celf1 induction in muscle Celf2 induction in heart Figure S1. Gene expression levels as estimated by RNA-seq, for genes induced during CELF1 or CELF2 time courses. A) CELF1 mRNA expression in heart, at various time points following CELF1 induction in heart. B) CELF1 mRNA expression in muscle, at various time points following CELF1 induction in muscle. C) CELF2 mRNA expression in heart, at various time points following CELF2 induction in heart. D) Mbnl1 exon 5 Ψ, at various time points following CELF1 induction in heart. E) Mbnl1 exon 5 Ψ, at various time points following CELF1 induction in muscle. F) CELF2-Flag fusion protein expression by Western (HRP-conjugated anti-Flag antibody, Sigma A8592), and total protein stain by Ponceau S for each heart sample. 90 ug of total protein from the apex of each heart was loaded. G) CELF1 (mouse monoclonal 3B1 anti-CELF1 antibody, Millipore) and MBNL1 (mouse monoclonal MB1a 4A8 anti-MBNL1 antibody, Glenn Morris -CIND) protein expression by Western analysis, with Calnexin loading control (rabbit polyclonal anti-Calnexin antibody, Abcam ab22595) in response to CELF1 over-expression in C2C12 myoblasts. MBNL1 protein levels remain unperturbed following CELF1 OE.   Figure S3. CELF1 CLIP-seq in heart, muscle, and myoblasts. A) The proportion of reads mapping to 5' UTRs, coding sequence, introns, and 3' UTRs is shown as pie charts for CLIP-seq and RNA-seq from heart, muscle, and myoblasts. The ratio between these values is shown as bar plots, illustrating enrichment for introns and 3' UTRs in CLIP reads relative to RNA-seq reads. B) Motif enrichment Z-scores for all pentamers occurring in CELF1 CLIP-seq clusters, relative to randomly selected genic regions, are shown for CELF1 CLIP clusters identified in heart, muscle, and myoblasts. C) Information content of nucleotides adjacent to sites with >30% crosslink-induced substitutions is illustrated.  Figure S4. CELF1 binding is moderately associated with splicing regulation. Splicing events were binned by gene expression level (RPKM), and the observed fraction of splicing events with both binding and splicing regulation was plotted in black dots. Splicing regulation was defined using a minimum MZ of 2, and binding between 1 kb upstream of the 3' splice site and 1 kb downstream of the 5' splice site was counted. The expected fraction of events with both binding and regulation was estimated by assuming independence, and plotted in gray dots. A Fisher's exact test was performed to assess whether the observed fractions were different than expected, at each gene expression level bin. The analysis was performed separately in heart and muscle, for splicing events with at least 1, 2, or 3 CELF1 CLIP clusters. There was a modest (~10-40%) enrichment for splicing regulation of bound splicing events.  Figure S5. CELF1 induction in heart leads to de-repression of microRNA targets. A) Mean log expression change following CELF1 induction (7 d over control) for transcripts grouped by number of number of conserved target sites to conserved miRNAs in their 3' UTRs. Transcripts with greater numbers of conserved miRNA seed matches are de-repressed in heart but not in muscle. Significance was assessed by rank-sum test, where each microRNA seed bin was compared to the zero microRNA seed bin. B) The expression changes of miRNA target gene sets was analyzed, and the number of sets with significant derepression (P < 1e-4 by rank-sum test) is listed above for each of four cohorts of miRNAs, grouped by miRNA abundance in heart, from lowest to highest. Target sets corresponding to highly expressed miRNAs are more often derepressed. Figure S6. Alternative last exon abundance as a function of difference in number of CELF1 CLIP clusters between each 3' UTR. The fraction of ALEs that are significantly repressed following CELF1 over-expression (MZ > 1.5), grouped by density in repressed 3' UTR minus density in enhanced 3' UTR. Analyses were performed using UTR pairs in which CELF1 binding is at least 100 nt (A), 300 nt (B), or 500 nt (C) away from both 3' splice sites of both alternative last exons.

A B
Nucleotides between closest MBNL1 (C2C12) and CELF1 (heart) clusters Nucleotides between closest MBNL1 (C2C12) and CELF1 (muscle) clusters Nucleotides between closest MBNL1 (C2C12) and CELF1 (heart) clusters Nucleotides between closest MBNL1 (C2C12) and CELF1 (muscle) clusters Figure S7. CELF1 and MBNL1 binding sites are closer than expected. A) The distribution of observed and expected distances between the centers of CELF1 and MBNL1 CLIP clusters. Plots on the left were generated using CELF1 CLIP clusters from heart, and on the right, CELF1 CLIP clusters from muscle. All MBNL1 CLIP clusters were derived from myoblasts. Expected distances for plots in the top row were computed by reassigning the clusters to random locations within 3' UTRs. For plots on the bottom, expected distances were computed by keeping the clusters in the same location, but randomly reassigning clusters as being bound by CELF1 or MBNL1. B) Expression change following CELF1 induction in muscle (7 d versus control) for genes with 1 MBNL1 CLIP cluster and more than 1 CELF1 CLIP cluster, grouped by the number of CELF1 sites < 50 nt away from the MBNL1 site and normalized to genes with no CELF1 sites < 50 nt away from the Mbnl1 site.