
Intrinsic and extrinsic factors contributing to the origin of new TSSs. (A) Composition of major repeat families in four TSS groups. We considered the nearest repeat element within TSS ± 100 bp. (B) Distribution of young TSSs plotted against the consensus LTR/THE1B element. Schematic of THE1B indicates the original TSS, U3, R, and U5 regions for the element. (C) Distribution of young TSSs plotted against the consensus LINE/L1 element. Schematic of the L1 structure indicates the original sense and antisense TSSs at the 5′ end. (D) Comparison of distances of TSS-associated and non-TSS-associated LTRs to the closest old TSSs. Distances of random intervals to the closest old TSSs are also provided for comparison. Inset shows a box plot of the same distribution. (E) Comparison of distances of TSS-associated and non-TSS-associated LTRs to the closest CTCF or RAD21 ChIA-PET peaks (from GM12878; only mammalian-conserved peaks were used). Distances of random intervals are calculated in a similar manner to panel D. Inset shows a box plot of the same distribution. (F) Exponential approximation for the number of genes with a certain number of TSSs and number of TSSs per gene, based on data of all TSSs. R2 is the coefficient of determination for the linear regression. Gray shade indicates the 95% confidence interval. (G) Exponential approximation for number of genes and number of newly gained TSSs per gene, based on data of newly emerged TSSs in three periods. Statistical significances in D and E were calculated by one-tailed Wilcoxon rank-sum tests; (***) P < 0.001.











