Large-scale analysis of branchpoint usage across species and cell lines
- Allison J Taggart,
- Chien-Ling Lin,
- Barsha Shrestha,
- Claire Heintzelman,
- Seongwon Kim and
- William G Fairbrother1
- ↵* Corresponding author; email: william_fairbrother{at}brown.edu
Abstract
The coding sequence of each human pre-mRNA is interrupted, on average, by eleven introns that must be spliced out for proper gene expression. Each intron contains three obligate signals: a 5' splice site, a branch site and a 3' splice site. Splice site usage has been mapped exhaustively across different species, cell types and cellular states. In contrast, only a small fraction of branch sites have been identified even once. The few reported annotations of branch site are imprecise as reverse transcriptase skips several nucleotides while traversing a 2-5 linkage. Here, we report large-scale mapping of the branchpoints from deep sequencing data in three different species and in the SF3B1 K700E oncogenic mutant background. We have developed a novel method whereby raw lariat reads are refined by U2snRNP/pre-mRNA basepairing models to return the largest current dataset of branchpoint sequences with quality metrics. This analysis discovers novel modes of U2snRNA:pre-mRNA basepairing conserved in yeast and provides insight into the biogenesis of intron circles. Finally, matching branchsite usage with isoform selection across the extensive panel of ENCODE RNA-seq datasets, offers insight into the mechanisms by which branchpoint usage drives alternative splicing.
- Received December 21, 2015.
- Accepted January 18, 2017.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











