Figure 2.

Enformer poorly models individual TF recognition site deletions. (A) In silico scanning deletion analysis of Sox2 DHS23 and DHS24 accessibility. Enformer was used to predict accessibility (mESC_CJ7 DNase-seq) for a series of DHS23–DHS24 virtual payloads replacing the Sox2 LCR. Sixteen basepair deletions were encoded by replacing the payload sequence with N's sliding across the length of DHS23 and DHS24. Colored lines indicate the DHS accessibility for each deletion location (x-axis). Shown are DHS23 and DHS24 accessibility according to deletion position represented by the purple and green lines, respectively. Horizontal dotted lines indicate baseline accessibility. Boxes above indicate relevant TF recognition sequences. (B) Comparison of experimentally measured and predicted accessibility at payloads including DHSs 23 and/or 24 with TF recognition sequences deleted (Δ) or mutated (mut), delivered in place of the Sox2 LCR, and profiled for expression (Brosh et al. 2023). Difference between measured (closed circle) and predicted (open circle) accessibility is shown by a line and colored by direction of difference. Predicted expression was scaled to WT using a linear regression fitted to all payload examples in Figure 1B.

2539f02