
Identification of TSSs from alignments of 5′-end reads. (A) Process for 5′-end library construction for Illumina GA sequencing. See details in Methods. (B) Frequency distribution of reads aligned to genomic locations upstream of NM_058956 (phosphatase). Three peaks were calculated as the best-fitting set of Gaussian distributions according to Bayesian information criteria. (C) Three classes of genes with different patterns of Gaussian TSS distributions in the regions of 3000 bp upstream of the translation initiation sites. (Top figure) Case in which both Gaussian outron/exon-TSSs occur because of the presence of an SL1 site between the two classes of Gaussian TSSs. (Middle and bottom figures) Cases in which all Gaussian TSSs are exon-TSSs and outron-TSSs, respectively. (Vertical bars with asterisk) Representative TSSs with the maximum number of aligned 5′-end tags. (Right bars) The numbers of genes in the three classes. As shown in the bottom, alignments of paired-end reads were useful in linking representative TSSs to their gene bodies over SL1 sites. (D) Frequency of Gaussian exon/outron-TSSs. (E) 5′-End reads excluding rRNA reads were categorized into the four groups of reads that were mapped onto promoters of outron/exon-TSSs, exons, introns, and intergenic regions, respectively. The Venn diagram illustrates the relationship among the four groups. (F) Frequency distribution of distances between 5′-exon boundaries from WS220 (Hillier et al. 2009; Harris et al. 2010) and proximal exon-TSSs identified in our data. (G) Examples of 5′ capture data for genes that have been previous subjects of TSS mapping. (Above) Exon 5′ end for the gene myo-1 with original 5′ region having four candidate 5′ ends (from S1 nuclease mapping with accuracy to within a few bp) (Okkema et al. 1993) marked with green arrows. (Below) Outron 5′ region for rpl-2 with original 5′ end (from 5′ RACE mapping with accuracy to within 1–2 bp) (Sleumer et al. 2012) highlighted by the green arrow. (H) Frequency distribution of the distances between SL1 sites and most abundant representative outron-TSSs supported by 10 or more single-end reads. (Left inset) This magnifies the region ranging from 10 to 100 bp; (right inset) the 90th, 75th, 25th, and 10th percentiles and the median of outron lengths. (I) Numbers of TSS clusters categorized into NP, BP, and WP types in terms of five values of the minimum threshold on the size of TSS clusters. Because multiple TSS clusters may be associated with each gene, the second vertical axis shows the numbers of genes involved. Observe that TSS clusters of WP type are dominant for threshold ≥25.











