Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

Transcript assembly improves scRNA-seq TE expression analysis. Data sets used in this figure are summarized in Supplemental Table S1. (A) Flowchart of scRNA-seq TE quantification pipeline. In short, transcript assembly was performed with bulk RNA-seq data, and transcripts that overlap with TEs but not protein-coding exons were used for expression quantification in scRNA-seq data. (B) Transcript assembly using three mESC bulk RNA-seq data (Wang laboratory) yielded 692 TE transcripts. Among these TE transcripts, 179 overlap with ncRNAs annotated by RefSeq. (C) FANTOM5 CAGE peaks, ATAC-seq signals, and CpG methylation signals at the promoter region of TE transcripts with RPKM ≥ 1 (reads per kilobase million). (D) Correlation between mESC bulk RNA-seq and averaged Smart-seq (Teichmann laboratory) signals at TE transcripts. Color scale represents the number of candidates. (E) TE-family enrichment analysis using expressed TE transcripts. Enrichment of ERV elements was observed with both bulk RNA-seq and Smart-seq samples. (F) Examples of TE transcript. Assembled TE transcripts, uniquely mapped reads of mESC bulk RNA-seq, Smart-seq, merged Smart-seq, ATAC-seq, and CpG methylation were included. (Left) A TE transcript that initiates from RLTR16b_MM. This TE transcript overlaps Platr14, a long ncRNA known to impact the mESC differentiation-associated genes. (Right) A TE transcript that initiates from RLTRETN_Mm. This transcript is largely composed of TEs and reflect the transcription unit of ERV.

This Article

  1. Genome Res. 31: 88-100

Preprint Server