
Browser-based refinement of the transcriptome by RNA-seq. We used the UCSC Genome Browser custom tracks with the WS190 version of the C. elegans genome for viewing the following data sets: (1) “Potential exon-junctions” track (blue), which displays potential nonannotated exon–exon junctions that are supported by RNA-seq reads by two bars, each for the 23 bp of the adjacent exons, with a connecting arrow that indicates the exon-junction directionality. The bar shade indicates a strength score, calculated from the number of aligned tags to the exon junction and number of bases from each junction that are included in the sequence tag, with darker shades representing higher scores (score = 100 × number of alignments × coverage score). The coverage score equals 1 when the smallest base coverage of the exon is 9 or 10 bases; 1.2 when the smallest base coverage of exon is 11, 12, or 13; or 1.5 when the smallest base coverage of exon is 14, 15, or 16. (Supplemental Fig. S2). (2) RNA sequences from regions with no existing gene predictions [“additional (nonannotated) transcript regions,” orange]. In this custom track the bar height represents the number of sequences that align to each position. (3) A poly(A) tags track (green) that displays polyadenylation junctions identified by the RNA-seq. The arrow in each bar points to the start position of the putative poly(A) tail. (4) SL1 tags track (purple) and (5) SL2 tags track (blue-gray) that display trans-splice leader sites identified by the RNA-seq. The arrow in each bar indicates splice leader directionality. (6) A polysome tags track (pink) that displays observed tags from a polysome-enriched RNA pool. The browser shots exemplify the discovery of nonannotated transcribed regions from RNA-seq data. For chrX:17480500–17842000 (A) nonannotated genomic tags with a darkly shaded splice junction suggest a transcript. This transcript and splice were validated by RT-PCR and sequencing (B; PCR Sanger-sequence data not shown). The SL1 tags suggest the presence of a different transcript from the proximate R106.1 transcript. Polysome sedimentation (pink track in A) suggests that the transcript is present in polysome fractions. The light-shade exon junction and the poly(A) site do not have significant nonannotated genomic tag coverage at the same position, so this junction could be considered “provisional.” dcr-1 is an example of a gene that is studied by many research groups (e.g., Knight and Bass 2001; Duchaine et al. 2006; Pavelec et al. 2009), which we found to contain a predicted nonannotated exon. The exon is 195 bp long and is in-frame with the adjacent exon. This exon appears uniformly incorporated into the transcript; we see no evidence for differential splicing (C). The exon existence was confirmed by PCR, RT-PCR, and Sanger sequencing (D). The arrow in D indicates the size of the expected RT-PCR band from the WS190-annotated dcr-1. EST additions to GenBank (June 2010) further support the structures shown in this figure and Supplemental Figure 4SA.











