Josh T. Cuperus; Benjamin Groves; Anna Kuchina; Alexander B. Rosenberg; Nebojsa Jojic; Stanley Fields; Georg Seelig

Figure 2.

A convolutional neural network (CNN) approach to model random 5′ UTR sequences. (A) A three-layer convolutional neural network model trained on random 5′ UTRs was tested on a held-out test set of the top 5% based on input read depth. Tested 5′ UTRs are specified by color for those with or without an upstream open reading frame. (B) Four hundred eighty-eight thousand random 13-mers were scored for each filter in layer 1 of the CNN. The top 1000 13-mers were used to create a positional weight matrix (PWM) for each filter. These PWMs include motifs of start codons, stop codons, and guanine quadruplexes. Positive Pearson correlations indicate a positive effect on enrichment, while negative correlations indicate a negative effect on enrichment. (C) The effect of each motif per position was measured by assessing the Pearson correlation of motif score and enrichment at each position. Heat maps of all 5′ UTRs (left) and those lacking upstream AUGs (right), including specific examples highlighting filters with different positional patterns are shown.

Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

This Article

Preprint Server

Current Issue

In This Issue