Synthetic spike-in standards for RNA-seq experiments

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 5.
Figure 5.

Sequence patterns predictive of overrepresentation in RNA-seq. Patterns in the single-end 100% ERCC library (library 6) and ENCODE strand-specific pair end libraries (libraries 7–50) based on coefficients from the glm model (Li et al. 2010) (see Methods). (A,B) Regression coefficient for each base at positions around the beginning of reads mapped to the forward (A) and reverse (B) strands of ERCC-transcripts in the unstranded 100% ERCC library (library 6). (C,D) Regression coefficient for each type of nucleotide at different relative position to the upstream (C) or downstream (D) read of read pairs mapped to ERCC in the stranded ENCODE libraries. Adenosine is treated as base level in the regression model; i.e., the coefficient for “A” is always 0, while the other coefficients represent the predicted overrepresentation due to the presence of this nucleotide at this position, relative to an adenosine.

This Article

  1. Genome Res. 21: 1543-1551

Preprint Server