Supp_II

1 Summary

The purpose of this document is to provide a reproducible workflow for the analysis and comparisons made in the paper "Studying differential usage of exons using RNA-seq data". The starting point of the workflow are the files with the read alignments. All the data files and all scripts needed to run this analysis are provided in http://www-huber.embl.de/pub/DEXSeq/analysis/. Environment variables that point to the location of these files are defined in the code. In order to reproduce the code, set these variables to reflect the location their local copies on your system.

2 System requirements to run all code

a) Unix environment with bash shell b) R and attached packages (check below) c) python 2.6.6 or higher e) htseq-0.5.3 python library f) cuffdiff 1.3.0, 1.1.0, 1.2.0

> sessionInfo()
R Under development (unstable) (2012-03-13 r58728)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] gplots_2.10.1          KernSmooth_2.23-7      caTools_1.12          
 [4] bitops_1.0-4.1         gdata_2.8.2            gtools_2.6.2          
 [7] lattice_0.20-6         GenomicFeatures_1.7.31 AnnotationDbi_1.17.27 
[10] BSgenome_1.23.4        GenomicRanges_1.7.36   Biostrings_2.23.6     
[13] IRanges_1.13.30        multicore_0.1-7        DEXSeq_1.1.10         
[16] Biobase_2.15.4         BiocGenerics_0.1.14   

loaded via a namespace (and not attached):
 [1] biomaRt_2.11.1     DBI_0.2-5          hwriter_1.3        plyr_1.7.1        
 [5] RCurl_1.91-1       Rsamtools_1.7.40   RSQLite_0.11.1     rtracklayer_1.15.7
 [9] statmod_1.4.14     stats4_2.16.0      stringr_0.6        tools_2.16.0      
[13] XML_3.9-4          zlibbioc_1.1.1

3 Setting the enviroment variables

exportANALYSIS=/g/huber/users/reyes/Graveley/doc
exportSCRIPTS=$ANALYSIS/scripts

The variable "ANALYSIS" should point to a local copy of the directory http://www-huber.embl.de/pub/DEXSeq/analysis/.

4 Brooks et al

Brooks et al generated data using S2 cell lines from 3 biological replicates of a siRNA knockdown of the splicing factor pasilla and 4 biological replicates of an untreated control. These reads were downloaded from the SRA archive accession number GSM18508 and aligned using Tophat version 1.2.0 against the Drosophila Melanogaster genome.

4.1 DEXSeq analysis

Brooks et al annotation

We start by downloading the annotation file from the supplementary material of the Brooks et al paper (this step is not shown here). The script finaledit.pl adds columns with empty values to this gtf file in order to make it compatible with the standard gtf format. Then, we define the exon counting bins, as described in the paper, using the script dexseq_prepare_annotation.py

cd $ANALYSIS/brooksetal
perl $SCRIPTS/finaledit.pl SG_r5_11_MB5_annotations.gtf \
   > graveley_annotation.gtf
python $SCRIPTS/dexseq_prepare_annotation.py graveley_annotation.gtf \
    graveley_DEXSeq.gtf

Now we can count the number of reads whose alingnment overlaps with any of these defined exon bins.

python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \
   graveley_DEXSeq.gtf bam/treated1.sam treated1gr.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   graveley_DEXSeq.gtf bam/treated2.sam treated2gr.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   graveley_DEXSeq.gtf bam/treated3.sam treated3gr.counts
python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \
   graveley_DEXSeq.gtf bam/untreated1.sam untreated1gr.counts
python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \
   graveley_DEXSeq.gtf bam/untreated2.sam untreated2gr.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   graveley_DEXSeq.gtf bam/untreated3.sam untreated3gr.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   graveley_DEXSeq.gtf bam/untreated4.sam untreated4gr.counts

We perform differential exon usage analysis using DEXSeq.

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")
countfiles <- list.files("../brooksetal", full.names=TRUE, pattern="en.counts")
samplenames <- sapply(
      strsplit(sapply(
         strsplit(countfiles, "gr"), 
         "[[", 1), 
      "\\/"), 
   "[[", 3)
conditions <- substr(samplenames, 1, c(7, 7, 7, 9,9,9,9))
type <- c("single", "paired", "paired", "single", "single", "paired", "paired")
des <- data.frame(condition=conditions, type=type)
library(DEXSeq)
ecs <- read.HTSeqCounts(countfiles, design=des, 
   flattenedfile="../brooksetal/ENSEMBLannotation.DEXSeq.gtf")
sampleNames(ecs) <- samplenames

library(multicore)
formuladispersion <- count ~ sample + (condition + type) * exon
formula0 <- count ~ sample + type * exon + condition
formula1 <- count ~ sample + type * exon + condition * I(exon == exonID)
ecs <- makeCompleteDEUAnalysis(ecs, minCount=0, nCores=10, 
   formulaDispersion=formuladispersion, 
   formula0=formula0, 
   formula1=formula1, 
   maxExon=10000)

save(ecs, file="brooksannotationecs.RData")

We observe 259 exon usage differences in 159 genes.

suppressMessages(library(DEXSeq))
setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")
load("brooksannotationecs.RData")
table( fData(ecs)$padjust < 0.1 )
length(unique(geneIDs(ecs)[which( fData(ecs)$padjust < 0.1 )]))

 
FALSE  TRUE 
41914   259
[1] 159

ENSEMBL annotation (used for comparisons with cuffdiff)

The differential exon usage analysis in the publication was also done using a different annotation, ENSEMBL (Flybase). To do this, we begin by downloading the annotation files from ENSEMBL from the release 62.

wget ftp://ftp.ensembl.org/pub/release-62/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.25.62.gtf.gz
gunzip Drosophila_melanogaster.BDGP5.25.62.gtf.gz

We run a preprocesing script edit_gtf_forcuff.pl to remove the "Het" chromosomes from the annotation file, and change the chromosome names to match them with the chromosome names from the alignment indexes. (eg. change 2L to chr2L).

perl $SCRIPTS/edit_gtf_forcuff.pl Drosophila_melanogaster.BDGP5.25.62.gtf \
     > Drosophila_melanogaster.BDGP5.25.62.mychr.gtf

We run the preprocessing scripts for DEXSeq.

python $SCRIPTS/dexseq_prepare_annotation.py \
     Drosophila_melanogaster.BDGP5.25.62.mychr.gtf \
     ENSEMBLannotation.DEXSeq.gtf

We proceed to count how many read alignments overlap with our defined counting bins, using the script dexseq_count.py:

python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \ 
   ENSEMBLannotation.DEXSeq.gtf bam/treated1.sam treated1en.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/treated2.sam treated2en.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/treated3.sam treated3en.counts
python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/untreated1.sam untreated1en.counts
python $SCRIPTS/dexseq_count.py -p no -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/untreated2.sam untreated2en.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/untreated3.sam untreated3en.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 10 \
   ENSEMBLannotation.DEXSeq.gtf bam/untreated4.sam untreated4en.counts

Once finished with these preprocessing steps, we can continue our analysis using the Bioconductor package DEXSeq. The code is displayed below. If you would like to know details of the code, check the DEXSeq vignette and user manual.

We begin by generating an ExonCountSet object.

setwd(Sys.getenv("ANALYSIS"))
setwd("org")
countfiles <- list.files("../brooksetal", full.names=TRUE, pattern="en.counts")
samplenames <- sapply(
      strsplit(sapply(
         strsplit(countfiles, "en"), 
         "[[", 1), 
      "\\/"), 
   "[[", 3)
conditions <- substr(samplenames, 1, c(7, 7, 7, 9,9,9,9))
type <- c("single", "paired", "paired", "single", "single", "paired", "paired")
des <- data.frame(condition=conditions, type=type)
library(DEXSeq)
ecs <- read.HTSeqCounts(countfiles, design=des, 
   flattenedfile="../brooksetal/ENSEMBLannotation.DEXSeq.gtf")
sampleNames(ecs) <- samplenames

We define the formulas used for the analysis and make a complete differential exon usage analysis using 10 CPU cores, taking all the samples and testing treated vs untreated. Comparison Number 1

library(multicore)
formuladispersion <- count ~ sample + (condition + type) * exon
formula0 <- count ~ sample + type * exon + condition
formula1 <- count ~ sample + type * exon + condition * I(exon == exonID)
allENSecs <- list()
allENSecs[[1]] <- makeCompleteDEUAnalysis(ecs, minCount=0, nCores=10, 
   formulaDispersion=formuladispersion, 
   formula0=formula0, 
   formula1=formula1, 
   maxExon=10000)

Testing just in the paired end samples, taking the library type as a covariate, e.g. using the default formulas. Comparison number 2.

allENSecs[[2]] <- makeCompleteDEUAnalysis(ecs[,design(ecs)$type == "paired"],
   minCount=0,
   nCores=10,
   maxExon=100000)

Mixing paired-end and single-end libraries and testing treated vs untreated. Comparison number 3, treated 1-2 vs untreated 1-3.

allENSecs[[3]] <- makeCompleteDEUAnalysis(ecs[,c(1,2,4,6)],
   minCount=0,
   nCores=10,
   maxExon=100000)

Making the first "two-by-two" comparison, untreated 1-3 vs untreated 2-4, comparison number 4:

ecsun <- ecs[,design(ecs)$condition == "untreated"]
design(ecsun)$condition <- factor(c("un-A", "un-B", "un-A", "un-B"))
allENSecs[[4]] <- makeCompleteDEUAnalysis(ecsun,
   minCount=0,
   nCores=10,
   maxExon=100000)

Making the second "two-by-two" comparison untreated 1-4 vs untreated 2-3, comparison number 5:

design(ecsun)$condition <- factor(c("un-A", "un-B", "un-B", "un-A"))
allENSecs[[5]] <- makeCompleteDEUAnalysis(ecsun,
   minCount=0,
   nCores=10,
   maxExon=100000)

save(allENSecs, file="../brooksetal/allENSecs.RData")

Cuffdiff 1.3.0 analysis

The exact same tests done with DEXSeq are also performed with cuffdiff. We begin by modifying the annotation files to add the attributes that cuffdiff requires to test for splicing regulation as its manual indicates. According to this manual, the output file with name cuffcmp.combined.gtf should have now the appropiate attributes for cuffdiff.

cd $ANALYSIS/brooksetal
exportCUFFDIFF=/g/huber/users/reyes/software/opt/cufflinks-1.3.0.Linux_x86_64/
$CUFFDIFF/cuffcompare -s d_melanogaster_dm3.fa -CG -p Drosophilacuffdiffattr -r \
   Drosophila_melanogaster.BDGP5.25.62.mychr.gtf \
   Drosophila_melanogaster.BDGP5.25.62.mychr.gtf

Now we run cuffdiff, testing all treated vs untreated samples, comparison number 1.

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/all-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam,bam/treated3.bam \
   bam/untreated1.bam,bam/untreated2.bam,bam/untreated3.bam,bam/untreated4.bam

Run cuffdiff comparing treated vs untreated just with the paired samples, comparison number 2:

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/paired-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated2.bam,bam/treated3.bam \
   bam/untreated3.bam,bam/untreated4.bam

Run cuffdiff for comparison number 3:

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/mixed-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam \
   bam/untreated1.bam,bam/untreated3.bam

Run cuffdiff testing untreated 1-3 vs untreated 2-4, comparison number 4:

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/untreated13-vs-untreated24 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated3.bam \
   bam/untreated2.bam,bam/untreated4.bam

Run cuffdiff testing untreated 1-4 vs untreated 2-3, comparison number 5:

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/untreated14-vs-untreated23 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated4.bam \
   bam/untreated2.bam,bam/untreated3.bam

Cuffdiff 1.2.0 analysis

We perform exactly the same comparisons, now using cuffdiff version 1.2.0:

exportCUFFDIFF102=/g/huber/users/reyes/software/opt/cufflinks-1.2.0.Linux_x86_64/

$CUFFDIFF102/cuffdiff --FDR 0.1 -o cuffdiff-outputs-102/all-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam,bam/treated3.bam \
   bam/untreated1.bam,bam/untreated2.bam,bam/untreated3.bam,bam/untreated4.bam

$CUFFDIFF102/cuffdiff --FDR 0.1 -o cuffdiff-outputs-102/paired-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated2.bam,bam/treated3.bam \
   bam/untreated3.bam,bam/untreated4.bam

$CUFFDIFF102/cuffdiff --FDR 0.1 -o cuffdiff-outputs-102/mixed-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam \
   bam/untreated1.bam,bam/untreated3.bam

$CUFFDIFF102/cuffdiff --FDR 0.1 -o cuffdiff-outputs-102/untreated13-vs-untreated24 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated3.bam \
   bam/untreated2.bam,bam/untreated4.bam

$CUFFDIFF102/cuffdiff --FDR 0.1 -o cuffdiff-outputs-102/untreated14-vs-untreated23 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated4.bam \
   bam/untreated2.bam,bam/untreated3.bam

Cuffdiff 1.1.0 analysis

Now using cuffdiff version 1.1.0:

exportCUFFDIFF110=/g/huber/users/reyes/software/opt/cufflinks-1.1.0.Linux_x86_64

$CUFFDIFF110/cuffdiff --FDR 0.1 -o cuffdiff-outputs-110/all-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam,bam/treated3.bam \
   bam/untreated1.bam,bam/untreated2.bam,bam/untreated3.bam,bam/untreated4.bam

$CUFFDIFF110/cuffdiff --FDR 0.1 -o cuffdiff-outputs-110/paired-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated2.bam,bam/treated3.bam \
   bam/untreated3.bam,bam/untreated4.bam

$CUFFDIFF110/cuffdiff --FDR 0.1 -o cuffdiff-outputs-110/mixed-treated-vs-untreated \
   cuffcmp.combined.gtf \
   bam/treated1.bam,bam/treated2.bam \
   bam/untreated1.bam,bam/untreated3.bam

$CUFFDIFF110/cuffdiff --FDR 0.1 -o cuffdiff-outputs-110/untreated13-vs-untreated24 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated3.bam \
   bam/untreated2.bam,bam/untreated4.bam

$CUFFDIFF110/cuffdiff --FDR 0.1 -o cuffdiff-outputs-110/untreated14-vs-untreated23 \
   cuffcmp.combined.gtf \
   bam/untreated1.bam,bam/untreated4.bam \
   bam/untreated2.bam,bam/untreated3.bam

DEXSeq vs cuffdiff

These are the numbers summarizing the comparisons which each software. In comparison 1, we made a test, treated vs untreated, using all the 7 samples.

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("allENSecs.RData")

sapply(allENSecs, 
   function(x){
      length(unique(
         geneIDs(x)[which(fData(x)$padjust < 0.1)]
      ))
    }
)

system("grep -c yes cuffdiff-outputs*/*/splicing*")

[1] 159 154  52   8   7
 cuffdiff-outputs-102/all-treated-vs-untreated/splicing.diff:69
cuffdiff-outputs-102/mixed-treated-vs-untreated/splicing.diff:120
cuffdiff-outputs-102/paired-treated-vs-untreated/splicing.diff:735
cuffdiff-outputs-102/untreated13-vs-untreated24/splicing.diff:650
cuffdiff-outputs-102/untreated14-vs-untreated23/splicing.diff:724
cuffdiff-outputs-110/all-treated-vs-untreated/splicing.diff:145
cuffdiff-outputs-110/mixed-treated-vs-untreated/splicing.diff:323
cuffdiff-outputs-110/paired-treated-vs-untreated/splicing.diff:310
cuffdiff-outputs-110/untreated13-vs-untreated24/splicing.diff:314
cuffdiff-outputs-110/untreated14-vs-untreated23/splicing.diff:392
cuffdiff-outputs/all-treated-vs-untreated/splicing.diff:50
cuffdiff-outputs/mixed-treated-vs-untreated/splicing.diff:578
cuffdiff-outputs/paired-treated-vs-untreated/splicing.diff:754
cuffdiff-outputs/untreated13-vs-untreated24/splicing.diff:639
cuffdiff-outputs/untreated14-vs-untreated23/splicing.diff:728

Making the HTML

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("allENSecs.RData")
ecs <- allENSecs[[1]]
plus <- fData(ecs)$strand == "+"
minus <- unique(geneIDs(ecs)[which(!plus)])
exonID <- exonIDs(ecs)
geneID <- geneIDs(ecs)
for(gn in minus){
   exids <- exonID[geneID %in% gn]
   exonID[geneID %in% gn] <- exids[length(exids):1]
}

exonIDs(ecs) <- exonID
save(ecs, file="ecs-plots.RData")
color.samples=c("red1", "red4", "red4", "blue1", "blue1", "#2171B5", "#2171B5")
mart <- useMart("ensembl", dataset="dmelanogaster_gene_ensembl")
attributes <- c("external_gene_id", "description")
filter="ensembl_gene_id"
DEXSeqHTML(ecs, FDR=0.1, color.samples=color.samples, path="ens-rep", filter=filter, mart=mart, attributes=attributes)

5 Braward et al

To demonstrate that our method is also applicable to other types of experimental designs and different library preparations, we performed additional analysis taking the chimpanzee brain and cerebellum samples from the dataset by Braward et al. In this case, our aim was to find exon usage differences between this tissues. The dataset contains 5 brain samples, one female and 4 males, and 2 cerebellum samples from one female and one male. As all the samples from the male brain samples are paired-end and the rest are single read, we decided to trim all the reads to make them look the same, e.g. single read 76 bp, to avoid confounding factors. The short reads can be downloaded from the Gene Expression Omnibus under the accession code GSE30352.

These alignments were done using GSNAP 2012-01-11 against Pan troglodytes genomic DNA of the ENSEMBL release 64.

DEXSeq analysis

We download the chimp annotation from the ENSEMBL release 64 and define our counting bins.

cd $ANALYSIS/brawardetal/
exportSCRIPTS=../scripts
wget ftp://ftp.ensembl.org/pub/release-64/gtf/pan_troglodytes/Pan_troglodytes.CHIMP2.1.64.gtf.gz
gunzip Pan_troglodytes.CHIMP2.1.64.gtf.gz
python $SCRIPTS/dexseq_prepare_annotation.py Pan_troglodytes.CHIMP2.1.64.gtf \
    ptr_DEXSeq.gtf

Count the number of reads falling into our defined exon bins.

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-F-1-ptr.unpaired_uniq\
   ptr-br-F-1-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-M-1-ptr.unpaired_uniq\
   ptr-br-M-1-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-M-2-ptr.unpaired_uniq\
   ptr-br-M-2-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-M-3-ptr.unpaired_uniq\
   ptr-br-M-3-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-M-4-ptr.unpaired_uniq\
   ptr-br-M-4-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-br-M-5-ptr.unpaired_uniq\
   ptr-br-M-5-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-cb-F-1-ptr.unpaired_uniq\
   ptr-cb-F-1-ptr.counts

python $SCRIPTS/dexseq_count.py -p no -s no -a 0\
   ptr_DEXSeq.gtf\
   aln/ptr-cb-M-1-ptr.unpaired_uniq\
   ptr-cb-M-1-ptr.counts

We remove the sex chromosomes to avoid biases due to the sex of the individuals. DEXSeq analysis testing brain against cerebellum:

setwd(Sys.getenv("ANALYSIS"))
setwd("brawardetal")
library(DEXSeq)
library(multicore)
filecounts <- dir(".", pattern="counts")
des <- sapply(strsplit(filecounts, "-"), "[[", 2)
ecs <- read.HTSeqCounts(filecounts, design=des, flattenedfile="ptr_DEXSeq.gtf")
ecs <- ecs[-which(fData(ecs)$chr == "X"),]
ecs <- makeCompleteDEUAnalysis( ecs, minCount=0, maxExon=150, nCores=10)

plus <- fData(ecs)$strand == "+"
minus <- unique(geneIDs(ecs)[which(!plus)])
exonID <- exonIDs(ecs)
geneID <- geneIDs(ecs)
for(gn in minus){
   exids <- exonID[geneID %in% gn]
   exonID[geneID %in% gn] <- exids[length(exids):1]
}
##
exonIDs(ecs) <- exonID

mart <- useMart("ensembl", dataset="ptroglodytes_gene_ensembl")
attributes <- c("external_gene_id", "description")
filter="ensembl_gene_id"

DEXSeqHTML(ecs, FDR=0.1, path="brawardetal-rep", mart=mart, attributes=attributes, filter=filter)
save(ecs, file="br-vs-cb.RData")

We take all the male brain samples, make all possible tests of 2 male brains against 2 male brains.

ecsMbr <- ecs[,c(2, 3, 4, 5, 6)]
posconditions <- cbind(expand.grid(c("br-a", "br-b"), c("br-a", "br-b")), 
                    Var3=c("br-a", "br-b", "br-b", "br-a"))
comboffor <- combn(1:5, 4)
listecs <- list()
cont=1
for(i in 1:ncol(comboffor)){
   for(j in 1:ncol(posconditions)){
      ecsn <- newExonCountSet(counts(ecsMbr[,comboffor[,i]]), 
                 design=posconditions[,j], 
                 geneIDs=as.character(geneIDs(ecs)), 
                 exonIDs=exonIDs(ecs))
      ecsn <- estimateSizeFactors(ecsn)
      ecsn <- estimateDispersions(ecsn, minCount=0, maxExon=150, nCores=10)
      ecsn <- fitDispersionFunction(ecsn)
      ecsn <- testForDEU(ecsn)
      listecs[cont] <- ecsn
      cont=cont+1
   }
}

save(listecs, file="br-ecs.RData")

Now we make all possible comparisons of 2 brains vs 2 cerebellums. Always taking one male and one female from each tissue. (Note that in the Brawand et al. data, samples from primates denoted brain are actually only prefrontal cortex.)

ecscbbr <- list()
for(i in 2:6){
   ecsn <- newExonCountSet(counts(ecs)[,c(1, i, 7, 8)], 
              design(ecs)[c(1, i, 7, 8)], 
              geneIDs=as.character(geneIDs(ecs)), 
              exonIDs=exonIDs(ecs))
   ecsn <- estimateSizeFactors(ecsn)
   ecsn <- estimateDispersions(ecsn, minCount=0, maxExon=150, nCores=10)
   ecsn <- fitDispersionFunction(ecsn)
   ecsn <- testForDEU(ecsn)
   ecscbbr[i-1] <- ecsn
}

save(ecscbbr, file="ecscbbr.RData")

Cuffdiff

We perform the same comparisons using cuffdiff version 1.3.0.

exportCUFFDIFF=/g/huber/users/reyes/software/opt/cufflinks-1.3.0.Linux_x86_64
cd $ANALYSIS/brawardetal

$CUFFDIFF/cuffcompare -s Pan_troglodytes.CHIMP2.1.64.dna.toplevel.fa -CG -r\
   Pan_troglodytes.CHIMP2.1.64.gtf\
   Pan_troglodytes.CHIMP2.1.64.gtf

grep -v ^X cuffcmp.combined.gtf > cuffcmp.combined.noX.gtf

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/all-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-1-ptr-sorted.bam,aln/ptr-br-M-2-ptr-sorted.bam,aln/ptr-br-M-3-ptr-sorted.bam,aln/ptr-br-M-4-ptr-sorted.bam,aln/ptr-br-M-5-ptr-sorted.bam

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/two-1-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-1-ptr-sorted.bam

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/two-2-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-2-ptr-sorted.bam

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/two-3-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-3-ptr-sorted.bam

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/two-4-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-4-ptr-sorted.bam

$CUFFDIFF/cuffdiff --FDR 0.1 -o cuffdiff-outputs/two-5-br-vs-cb \
   cuffcmp.combined.noX.gtf \
   aln/ptr-cb-F-1-ptr-sorted.bam,aln/ptr-cb-M-1-ptr-sorted.bam \
   aln/ptr-br-F-1-ptr-sorted.bam,aln/ptr-br-M-5-ptr-sorted.bam

For the mock comparisons we take all male brain samples (5), and make all the possible 2 vs 2 comparisons, calling them from R using system calls.

setwd(Sys.getenv("ANALYSIS"))
setwd("brawardetal")

comboffor <- combn(1:5, 4)
posconditions <- cbind(expand.grid(c("br-a", "br-b"), c("br-a", "br-b")), 
      Var3=c("br-a", "br-b", "br-b", "br-a"))

bamfiles <- dir("aln", pattern="*sorted.bam")[c(2, 3, 4, 5, 6)]
cont <- .01
for( i in 1:ncol(comboffor)){
   for( j in 1:ncol(posconditions)){
      totest <- bamfiles[comboffor[,i]]
      names(totest) <- posconditions[,j]
      gr1 <- paste("aln/", totest[which(names(totest) == "br-a")], sep="")
      gr2 <- paste("aln/", totest[which(names(totest) == "br-b")], sep="")
      gr1 <- paste(gr1[1], gr1[2], sep=",")
      gr2 <- paste(gr2[1], gr2[2], sep=",")
      nm1 <- sapply(strsplit(gr1, "-"), function(x){x[c(4,9)]})
      nm2 <- sapply(strsplit(gr2, "-"), function(x){x[c(4,9)]})
      nm1 <- paste(nm1[1,], nm1[2,], sep="")
      nm2 <- paste(nm2[1,], nm2[2,], sep="")
      dirname <- paste("cuffdiff-outputs/br", cont, nm1, nm2, sep="-")
      system(paste("mkdir", dirname))
      que <- paste("$CUFFDIFF/cuffdiff --FDR 0.1 -o", 
         dirname, "cuffcmp.combined.noX.gtf", gr1, gr2, sep=" ")
      print(que)
      system(que)
      cont=cont+.01
    }
}

DEXSeq vs cuffdiff

Summary of the results of the comparison brain vs cerebellum from the chimpanzee data from Braward et al:

suppressMessages(library(DEXSeq))
setwd(Sys.getenv("ANALYSIS"))
setwd("brawardetal")

load("br-vs-cb.RData")
table( fData(ecs)$padjust < 0.1 )
length(unique(geneIDs(ecs)[which( fData(ecs)$padjust < 0.1 )]))

system("grep -c yes cuffdiff-outputs/all-br-vs-cb/splicing.diff")

 
 FALSE   TRUE 
153103    866
[1] 650
114

Results of doing the mock comparisons 2 brains vs 2 brains,

load("br-ecs.RData")
sapply(listecs, 
   function(x){
      length(unique(
         geneIDs(x)[which(fData(x)$padjust < 0.1)]
      ))
   }
)

system("grep -c yes cuffdiff-outputs/br-0.*/splicing*")

 [1]   3   0 244   2   1   2   2   2   2  10   5   0   1  19   0
 cuffdiff-outputs/br-0.01-13-24/splicing.diff:405
cuffdiff-outputs/br-0.02-12-34/splicing.diff:399
cuffdiff-outputs/br-0.03-14-23/splicing.diff:590
cuffdiff-outputs/br-0.04-13-25/splicing.diff:628
cuffdiff-outputs/br-0.05-12-35/splicing.diff:499
cuffdiff-outputs/br-0.06-15-23/splicing.diff:555
cuffdiff-outputs/br-0.07-14-25/splicing.diff:460
cuffdiff-outputs/br-0.08-12-45/splicing.diff:504
cuffdiff-outputs/br-0.09-15-24/splicing.diff:308
cuffdiff-outputs/br-0.1-14-35/splicing.diff:497
cuffdiff-outputs/br-0.11-13-45/splicing.diff:554
cuffdiff-outputs/br-0.12-15-34/splicing.diff:353
cuffdiff-outputs/br-0.13-24-35/splicing.diff:476
cuffdiff-outputs/br-0.14-23-45/splicing.diff:823
cuffdiff-outputs/br-0.15-25-34/splicing.diff:526

Results of doing all possible comparisons, 2 cerebellums (male and female) vs 2 brains (male and female).

load("ecscbbr.RData")
sapply(ecscbbr, 
   function(x){
      length(unique(
         geneIDs(x)[which(fData(x)$padjust < 0.1)]
      ))
   }
)

system("grep -c yes cuffdiff-outputs/two-*/splicing*")

[1] 56 18 26 32 27
 cuffdiff-outputs/two-1-br-vs-cb/splicing.diff:230
cuffdiff-outputs/two-2-br-vs-cb/splicing.diff:361
cuffdiff-outputs/two-3-br-vs-cb/splicing.diff:370
cuffdiff-outputs/two-4-br-vs-cb/splicing.diff:215
cuffdiff-outputs/two-5-br-vs-cb/splicing.diff:380

6 ENCODE dataset

We now apply DEXSeq to an additional dataset from the ENCODE project. We chose the h1-hesc and Huvec cell lines because they had 2 biological replicates each one. They are 76 bp paired end reads.

exportENCODE=$ANALYSIS/encode
cd $ENCODE/fastq
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd1Rep1.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd2Rep1.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd1Rep2.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd2Rep2.fastq.gz


wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd1Rep1.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd2Rep1.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd1Rep2.fastq.gz
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/releaseLatest/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd2Rep2.fastq.gz

gunzip *gz

exportGSNAP=/g/huber/users/reyes/software/opt/gmap-2012-01-11-schroedinger/bin1/
exportFASTQ=$ANALYSIS/encode/fastq
cd $ENCODE/aln/

$GSNAP/gsnap -D /g/huber/users/reyes/Mammals/genome-indexes-sch-2012-11-01/ -d hsa --split-output=HuvecR1 -N 1 -A sam -t 4 $FASTQ/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd1Rep1.fastq $FASTQ/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd2Rep1.fastq
$GSNAP/gsnap -D /g/huber/users/reyes/Mammals/genome-indexes-sch-2012-11-01/ -d hsa --split-output=HuvecR2 -N 1 -A sam -t 4 $FASTQ/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd1Rep2.fastq $FASTQ/wgEncodeCshlLongRnaSeqHuvecCellLongnonpolyaFastqRd2Rep2.fastq

$GSNAP/gsnap -D /g/huber/users/reyes/Mammals/genome-indexes-sch-2012-11-01/ -d hsa --split-output=H1hescR1 -N 1 -A sam -t 4 $FASTQ/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd1Rep1.fastq $FASTQ/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd2Rep1.fastq
$GSNAP/gsnap -D /g/huber/users/reyes/Mammals/genome-indexes-sch-2012-11-01/ -d hsa --split-output=H1hescR2 -N 1 -A sam -t 4 $FASTQ/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd1Rep2.fastq $FASTQ/wgEncodeCshlLongRnaSeqH1hescCellLongnonpolyaFastqRd2Rep2.fastq

python $SCRIPTS/dexseq_count.py -p yes -s no -a 0 hsa.DEXSeq.gtf aln/HuvecR1.concordant_uniq HuvecR1.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 0 hsa.DEXSeq.gtf aln/HuvecR2.concordant_uniq HuvecR2.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 0 hsa.DEXSeq.gtf aln/H1hescR1.concordant_uniq H1hescR1.counts
python $SCRIPTS/dexseq_count.py -p yes -s no -a 0 hsa.DEXSeq.gtf aln/H1hescR2.concordant_uniq H1hescR2.counts

Run a standard differential exon usage analysis,

library(DEXSeq)
library(multicore)
load("ecs.RData")
ecs <- ecs[,3:6]
ecs <- estimateSizeFactors( ecs )
ecs <- estimateDispersions( ecs, nCores=15, maxExon=150, minCount=30)
ecs <- fitDispersionFunction( ecs )
ecs <- testForDEU( ecs, nCores=15 )
save(ecs, file="ecs-sub.RData")

### flip exon labels of the minus strand
plus <- fData(ecs)$strand == "+"
minus <- unique(geneIDs(ecs)[which(!plus)])
exonID <- exonIDs(ecs)
geneID <- geneIDs(ecs)
for(gn in minus){
   exids <- exonID[geneID %in% gn]
   exonID[geneID %in% gn] <- exids[length(exids):1]
}
##
exonIDs(ecs) <- exonID

ecs <- estimatelogfold2changes(ecs, nCores=35)

save(ecs, file="ecs-sub-plots.RData")
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
attributes <- c("external_gene_id", "description")
filter="ensembl_gene_id"
DEXSeqHTML(ecs, FDR=0.1, path="cell-lines-report", mart=mart, attributes=attributes, filter=filter)

setwd(Sys.getenv("ANALYSIS"))
setwd("encode/")

load("ecs-sub.RData")
table(fData(ecs)$padjust < 0.1)
length(unique(geneIDs(ecs)[which(fData(ecs)$padjust < 0.1)]))

 
 FALSE   TRUE 
253369  30339
[1] 7795

7 Figures and numbers of the paper

Numbers in the first paragraph

tx <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene")
pr2 <- exonsBy(tx, "tx")
sums <- sapply(pr2, function(x){rn <- ranges(x);sum(abs(start(rn)-end(rn)))})
median(sums)
max(sums)

Figure 2

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("allENSecs.RData")
ecs <- allENSecs[[1]]


png( "dispests.png", width=1400, height=1400, pointsize=30 )


plot( 
   rowMeans( counts(ecs,normalized=TRUE) ), 
   fData(ecs)$dispBeforeSharing, 
   log="xy", pch=20, cex=.2, col="#000000B0",
   xlim=c( 1, 3e5), ylim=c(1e-6,1e3), xaxs="i", yaxs="i",
   xlab="average normalized count value",
   ylab="Cox-Reid dispersion estimate",
   cex.axis=1.4,
   cex.lab=1.4)

segments(
   rowMeans( counts(ecs,normalized=TRUE) )[ fData(ecs)$dispBeforeSharing < 1e-6 ], 
   1.5e-6, 
   rowMeans( counts(ecs,normalized=TRUE) )[ fData(ecs)$dispBeforeSharing < 1e-6 ], 
   1e-6,
   col="#00000040" )

xg <- 10^seq( 0, 6, length.out=300 )
yg <- ecs@dispFitCoefs[1] + ecs@dispFitCoefs[2] / xg

lines( xg, yg, col="red", lwd=3 )
for( q in c( .01, .05, .95, .99 ) )
   lines( xg, yg * qchisq( q, 4 )/4, col="red", lwd=3, lty="dashed" )

dev.off()

Figure 3

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("ecs-plots.RData")
pdf("Ten-m.pdf")
plotDEXSeq(ecs, "FBgn0004449", legend=TRUE, norCounts=TRUE, color=c("red1", "blue1"), lwd=1.8)
dev.off()

Figure 4

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("ecs-plots.RData")

png( "MvA.png", width=1400, height=1400, pointsize=30 )
ylim <- 5
means <- rowMeans(counts(ecs,normalized=TRUE))
lfc <- fData(ecs)$`log2fold(untreated/treated)`
cols <- ifelse( fData(ecs)$padj<.1, "#A00000A0", "#00000040" )
plot( 
    means, lfc, col=cols,
    las=1, log="x", pch=20, cex=.4, 
    ylim=c(-ylim,ylim), yaxs="i", yaxt="n", xlim = range( means[fData(ecs)$testable] ),
    xlab="averaged normalized count value",
    ylab="fold change in exon usage",
    cex.axis=1.4,
    cex.lab=1.4  )
axis( 2, -4:4, 
   c( "1/16", "1/8", "1/4", "1/2", "1", "2", "4", "8", "16" ), las=1, cex.axis=1.4)
segments( means[ lfc < -ylim ], -ylim*.97, means[ lfc < -ylim ], -ylim,    
   col = cols[ lfc < -ylim ], lwd=3 )
segments( means[ lfc > ylim ], ylim*.97, means[ lfc > ylim ], ylim,    
   col = cols[ lfc > ylim ], lwd=3 )
dev.off()

Figure 5

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("ecs-plots.RData")
pdf("Ten-m.pdf")
plotDEXSeq(ecs, "FBgn0004449", legend=TRUE, norCounts=TRUE, color=c("red1", "blue1"), lwd=1.8)
dev.off()

Figure 6

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("ecs-plots.RData")
pdf("RpS14b.pdf")
plotDEXSeq(ecs, "FBgn0004404", legend=TRUE, norCounts=TRUE, color=c("red1", "blue1"), lwd=1.8)
dev.off()

Figure S1

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")
load("ecs-plots.RData")
pdf("ps.pdf")
plotDEXSeq(ecs, "FBgn0261552", legend=TRUE, norCounts=TRUE, color=c("red1", "blue1"), splicing=TRUE, lwd=1.8)
dev.off()

Figure S2

  setwd(Sys.getenv("ANALYSIS"))
  setwd("brooksetal/")

  load("ecs-plots.RData")
ecsPois <- ecs
fData(ecsPois)$dispersion <- 1e-10
ecsPois <- testForDEU( ecsPois, formula0 = count ~ sample + type * exon + condition, 
   formula1 = count ~ sample + type * exon + condition * I(exon == exonID), nCores = 30 )
ecsPois <- estimatelog2FoldChanges( ecsPois, nCores=40 )

png( "MvAZero.png", width=1400, height=1400, pointsize=30 )
ylim <- 5
means <- rowMeans(counts(ecsPois,normalized=TRUE))
lfc <- fData(ecsPois)$`log2fold(untreated/treated)`
cols <- ifelse( fData(ecsPois)$padj<.1, "#A00000A0", "#00000040" )
plot( 
    means, lfc, col=cols,
    las=1, log="x", pch=20, cex=.4, 
    ylim=c(-ylim,ylim), yaxs="i", yaxt="n", xlim = range( means[fData(ecsPois)$testable] ),
    xlab="averaged normalized count value",
    ylab="fold change in exon usage",
    cex.axis=1.4,
    cex.lab=1.4 )
axis( 2, -4:4, 
   c( "1/16", "1/8", "1/4", "1/2", "1", "2", "4", "8", "16" ), las=1, cex.axis=1.4)
segments( means[ lfc < -ylim ], -ylim*.97, means[ lfc < -ylim ], -ylim,    
   col = cols[ lfc < -ylim ], lwd=3 )
segments( means[ lfc > ylim ], ylim*.97, means[ lfc > ylim ], ylim,    
   col = cols[ lfc > ylim ], lwd=3 )
dev.off()

Figure S3

setwd(Sys.getenv("ANALYSIS"))
setwd("brooksetal/")

load("ecs-plots.RData")
pdf("FBgn0017581.pdf")
plotDEXSeq(ecs, "FBgn0017581", legend=TRUE, norCounts=TRUE, color=c("red1", "blue1"), splicing=TRUE, cex.axis=1, lwd=1.8)
dev.off()

Figure S4

setwd(Sys.getenv("ANALYSIS"))
setwd("brawardetal")
load("br-vs-cb.RData")
gene="ENSPTRG00000000042"  
fData(ecs)[geneIDs(ecs) %in% gene,]$transcripts <- c("PB1", "PB1", "PB1", "PB1", "PB1", "", "", "", "", "Y-kinase;S/T-kinase", "Y-kinase;S/T-kinase","Y-kinase;S/T-kinase","Y-kinase;S/T-kinase","Y-kinase;S/T-kinase","Y-kinase;S/T-kinase","Y-kinase;S/T-kinase;AGC-kinase", "AGC-kinase", "AGC-kinase", "AGC-kinase")
fData(ecs)[geneIDs(ecs) %in% gene,]$transcript
pdf("PRKCZ.pdf", width=8)
plotDEXSeq(ecs, gene, displayTranscripts=TRUE, names=TRUE, legend=TRUE, color=c("red1", "blue1"), lwd=1.8)
dev.off()

Figure S5

  setwd(Sys.getenv("ANALYSIS"))
  setwd("encode")
  load("ecs-sub-plots.RData")
  head(fData(ecs))
  ylim <- 5
  means <- rowMeans(counts(ecs,normalized=TRUE))
  lfc <- fData(ecs)$`log2fold(HuvecR/H1hescR)`
  cols <- ifelse( fData(ecs)$padj<.1, "#A0000010", "#00000010" )

png( "MvAencode.png", width=1400, height=1400, pointsize=30 )

  plot( 
      means, lfc, col=cols,
      las=1, log="x", pch=20, cex=.4, 
      ylim=c(-ylim,ylim), yaxs="i", yaxt="n", xlim = range( means[fData(ecs)$testable] ),
      xlab="averaged normalized count value",
      ylab="fold change in exon usage",
      cex.axis=1.4,
      cex.lab=1.4  )
  axis( 2, -4:4, 
     c( "1/16", "1/8", "1/4", "1/2", "1", "2", "4", "8", "16" ), las=1, cex.axis=1.4)
  segments( means[ lfc < -ylim ], -ylim*.97, means[ lfc < -ylim ], -ylim,    
     col = cols[ lfc < -ylim ], lwd=3 )
  segments( means[ lfc > ylim ], ylim*.97, means[ lfc > ylim ], ylim,    
     col = cols[ lfc > ylim ], lwd=3 )

dev.off()

Figure S6

setwd(Sys.getenv("ANALYSIS"))
setwd("encode")
load("ecs-sub-plots.RData")

design(ecs) <- as.factor(gsub("R", "", design(ecs)))
 pdf("NFYA.pdf", width=8)
  plotDEXSeq(ecs, "ENSG00000001167", displayTranscripts=TRUE, names=FALSE, legend=TRUE, color=c("red1", "blue1"), lwd=1.8, norCounts=TRUE)
  dev.off()

Detecting differential usage of exons from RNA-seq data

Supplement II: Studying differential usage of exons using RNA-seq data

Table of Contents

1 Summary

2 System requirements to run all code

3 Setting the enviroment variables

4 Brooks et al

4.1 DEXSeq analysis

Brooks et al annotation

ENSEMBL annotation (used for comparisons with cuffdiff)

Cuffdiff 1.3.0 analysis

Cuffdiff 1.2.0 analysis

Cuffdiff 1.1.0 analysis

DEXSeq vs cuffdiff

Making the HTML

5 Braward et al

DEXSeq analysis

Cuffdiff

DEXSeq vs cuffdiff

6 ENCODE dataset

7 Figures and numbers of the paper

This Article

Preprint Server

Current Issue

In This Issue