Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k–200k partitions of 10–200 molecules at a time, enabling analysis of 10–100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT. Coordination of internal exons is found to be extensive among protein coding genes: 23.5%–59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein–protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.


Supplementary figures Supplemental
(averaged over all GENCODE transcripts that start before the upstream exon and end after the downstream alternative exon.(E) Percent of purely protein-coding exon pairs that are coordinated and of exon-pairs that contain non-coding sequence.In this analysis, we chose exon pairs in both distributions so that the underlying distributions of informative reads in the two categories are identical.and used as in out for spISO-seq library generation on 10x Genomics instrument.

Experimental recommendations for the Chromium system
Please note that all experiments that were analyzed for biological results were obtained on the 10x Genomics GemCode system.While we were analyzing this data 10x Genomics released the updated Chromium system (version 2 being the current up-to-date version).In house, we noticed different behavior, which we are detailing here.Please also note, that some of the experimental procedure was changed for cost efficiency.In principle the changed steps are equivalent to the above procedure.we used a total input RNA concentration of 100ng and followed the SmartSeq2(1) protocol with the following modifications.The Oligo(dT) primer was diluted to 2,5nM and 1ul used for the priming, for the RT 0,2ul of LNA TSO Oligo was used.Second strand synthesis was performed using Kapa HiFi HotStart Ready Mix (Kapa Biosystems Cat#KK2600) primed with ISPCR primers and used the following thermal conditions, 98oC for 3min then 6 cycles of 60 oC for 5min, 72 oC for 20min followed by a final extension of 72 oC for 30min and held at 10 degrees.RNase H digestion was preformed after second strand synthesis to remove any RNA-DNA hybrid molecules by addition of 1U of RNase H (Thermo Scientific Cat#EN0201) and incubated for 30min at 37 oC.The final dscDNA library was quantified on Qubit and size distribution was assessed using the Fragment Analyzer and diluted in 10ul to 1ng/ul, 500pg/ul, 250pg/ul and 125pg/ul respectively for input into the 10x Genomics Genome protocol.We recorded the number of uniquely mapping read pairs and compared them to the ones obtained from the GemCode system.Note, that here we use very stringent mapping parameters to determine exact intron positions, not the more relaxed ones used in Figure 1 for molecule identification.

Re-mapping of previously published long read
Re-mapping of SLR-RNA-seq(2) data was performed using GMAP(3) as described

Figure S1 :
Figure S1: MiSeq exploration of GemCode behavior with low inputs.(A) Density plots of FPKMs with six input amounts to the system.(B) Heatmap of gene expression values (FPKM) across all six samples.

Figure S2 :Figure S3 :
Figure S2: Collision fraction by barcode.(A) Fraction of genes for each barcode that showed a collision.Barcodes are ordered by collision fraction.Gray area is enriched in false positive barcodes.(B) Number of genes for many barcodes without collisions, many barcodes with few collisions and for very few barcodes with many collisions.

Figure S4 :
Figure S4: EXOC7 example of coordination.Single gene view for the EXOC7 gene, Bottom, black track: GENCODE annotation.Middle, colored track: spiSO-seq data, with each line representing one molecule.Top, red-brown track: SLR-RNA-seq data with each line representing one molecule.

1 st and 2
nd strand cDNA synthesis Total RNA was extracted by use of Trizol LS (Invitrogen Cat#10296028) and chloroform to obtain phase separation into aqueous phase.RNA extraction from aqueous phase was performed with RNA Clean and Concentrator (Zymo Cat#R1015) following manufactures protocol.Total RNA was quantified by Qubit (Invitrogen Cat#Q32852) for quantification and run on the Fragment Analyzer (AATI Cat#DNF-472) for quality assessment then diluted to input concentrations of 40ng/ul.To synthesize first strand cDNA