High-resolution simulations of chromatin folding at genomic rearrangements in malignant B cells provide mechanistic insights into proto-oncogene deregulation

Genomic rearrangements are known to result in proto-oncogene deregulation in many cancers, but the link to 3D genome structure remains poorly understood. Here, we used the highly predictive heteromorphic polymer (HiP-HoP) model to predict chromatin conformations at the proto-oncogene CCND1 in healthy and malignant B cells. After confirming that the model gives good predictions of Hi-C data for the nonmalignant human B cell–derived cell line GM12878, we generated predictions for two cancer cell lines, U266 and Z-138. These possess genome rearrangements involving CCND1 and the immunoglobulin heavy locus (IGH), which we mapped using targeted genome sequencing. Our simulations showed that a rearrangement in U266 cells where a single IGH super-enhancer is inserted next to CCND1 leaves the local topologically associated domain (TAD) structure intact. We also observed extensive changes in enhancer-promoter interactions within the TAD, suggesting that it is the downstream chromatin remodeling which gives rise to the oncogene activation, rather than the presence of the inserted super-enhancer DNA sequence per se. Simulations of the IGH-CCND1 reciprocal translocation in Z-138 cells revealed that an oncogenic fusion TAD is created, encompassing CCND1 and the IGH super-enhancers. We predicted how the structure and expression of CCND1 changes in these different cell lines, validating this using qPCR and fluorescence in situ hybridization microscopy. Our work demonstrates the power of polymer simulations to predict differences in chromatin interactions and gene expression for different translocation breakpoints.

[Supplemental material is available for this article.] Chromatin structure, nuclear organization, and the epigenome are intimately linked to gene regulation and cell function and are tightly controlled during differentiation. Genome rearrangements leading to structural variants (including deletions, insertions, duplications, and translocations) can disturb these, leading to gene misregulation and cancer (Norton and Phillips-Cremins 2017). An important example occurs during B cell differentiation, where programmed breakage and recombination of the genome takes place in order to generate the broad heterogeneity of immunoglobulins (Igs) required for immune system function (Jung et al. 2006). Errors in this process can lead to repositioning of Ig regulatory elements which then drives proto-oncogene activation (Kontomanolis et al. 2020). The diversity in translocations and the difficulties in accessing and handling patient samples means that detection, accurate mapping, and characterization of their functional consequences are inherently problematic.
Advances in molecular probes of genomic and epigenomic structure, such as ChIP-seq and chromosome-conformation-capture methods like Hi-C, have transformed our understanding of the regulatory link between structure and function. For example, Hi-C, which probes chromosome interactions genome-wide, has revealed that the genome can be partitioned into topologically associated domains (TADs) (Dixon et al. 2012). These are contiguous chromosome regions which show enriched self-interactions and are thought to be associated with cis-regulatory mechanisms; they are often bounded by binding sites for the CCCTC-binding factor CTCF (Rao et al. 2014). ChIP-seq profiling of histone modifications and protein binding has identified super-enhancers, clusters of enhancer elements thought to drive expression patterns responsible for cell identity (Hnisz et al. 2013). Super-enhancers have been found to preferentially sit within TADs which are highly insulated from their surroundings (Gong et al. 2018) and have been identified as drivers of oncogene expression in many tumors (Thandapani 2019;Mikulasova et al. 2022).
A full epigenetic and three-dimensional (3D) structural characterization of genome rearrangements is crucial for our understanding of proto-oncogene activation. However, the small size of patient samples and their inherent variability means that routinely applying methods like Hi-C to primary cancer samples remains challenging, despite recent progress in reducing the amount of material required (Díaz et al. 2018). This is why much work to date has focused on the biogenesis of chromosome translocations (Engreitz et al. 2012;Roukos and Misteli 2014), but comparatively less attention has been paid to the structural consequences of translocations and other genome rearrangements (Bianco et al. 2018;Akdemir et al. 2020).
In this work, our aim was to show how computer simulations can help us understand the effects of common genome rearrangements in B cell malignancies. Specifically, we used our "highly predictive heteromorphic polymer" (HiP-HoP) model (Buckle et al. 2018;Brackey et al. 2020) to study genomic rearrangements involving the immunoglobulin heavy locus (IGH) (Watson and Breden 2012) and the CCND1 proto-oncogene encoding the cell cycle protein cyclin D1. IGH-CCND1 is a common translocation observed in mantle cell lymphoma (MCL) and multiple myeloma (MM); when translocated together, the IGH super-enhancers drive CCND1 overexpression. The goal was to use the model to shed light on how such rearrangements lead to changes in the 3D structure of gene loci which in turn leads to deregulation. If these simulations allow us to infer the pathway through which the CCND1 proto-oncogene becomes activated, they could be used to generate testable hypotheses to direct new experiments suggesting targets for therapeutic intervention. In the longer term, this points to a future role for the method in a clinical setting, for example, in personalized medicine, where structural predictions based on available or easily obtainable data could be extremely valuable. Figure 1, A-C, shows cartoons of the IGH and CCND1 loci in healthy cells, as well as two different rearrangements which are studied here. Specifically, we consider an MM cell line, U266, where a single IGH super-enhancer is inserted into the CCND1-TAD, and an MCL cell line, Z-138, possessing a translocation which repositions CCND1 next to the IGH joining region (Watson and Breden 2012). In another study published in this issue (Mikulasova et al. 2022), we analyzed the changes in patterns of histone modifications associated with super-enhancer translocation events, finding that relocation of the IGH super-enhancer results in local chromatin remodeling and the emergence of new patterns of histone modifications including H3K4me3 broad domains (Park et al. 2020). Here, we study how these epigenetic changes alter 3D chromosome structure.

Results
HiP-HoP uses data on DNA accessibility, histone modifications, and protein binding to generate an ensemble of 3D chromosome structures. We previously showed that the method gives good predictions of both population level data (e.g., Hi-C or 4C) and single-cell measurements (e.g., from fluorescence microscopy) (Buckle et al. 2018). Crucially, HiP-HoP does not use Hi-C (or any other 3D genome data) as an input (unlike other popular models which use fitting-based methods) (Serra et al. 2015). This means that we can make predictions about the 3D structure of cell types or tissues where Hi-C data is not available.

HiP-HoP predicts the domain structure at CCND1 in a B cell-derived cell line
We previously showed that, in malignant B cells, genome rearrangements involving CCND1 lead to local changes in histone modifications (Mikulasova et al. 2022). Here, we hypothesized that this in turn leads to changes in 3D structure and chromatin interactions which drive CCND1 expression. To understand these malignant structural changes, it is first necessary to study the healthy case. To this end, we considered the human B cell-derived lymphoblastoid cell line GM12878, adapting our HiP-HoP method to study chromatin structure around CCND1. This is an ideal cell line due to the abundance of publicly available data.
In the model, a 3-Mbp chromosome region was represented by a chain of beads (a common approach in polymer physics) (Brackey et al. 2020). The region was chosen because it is large enough to encompass seven TADs surrounding CCND1 such that the local chromatin context is captured but small enough to give feasible simulation run times. The model combines three mechanisms which drive locus structure: (1) diffusing spheres representing complexes of chromatin binding proteins which can bind to multiple points at the same time to form molecular bridges (e.g., between promoters and enhancers) (Brackley et al. 2013(Brackley et al. , 2016bBrackley 2020); (2) a heteromorphic polymer structure, where different sections of the bead chain have different properties (thickness/compaction and flexibility); and (3) the loop extrusion mechanism (Sanborn et al. 2015;Fudenberg et al. 2016). Loop extrusion is a molecular mechanism where chromatin is pushed onto loops by factors (the cohesin complex) (Davidson et al. 2019), which are halted by CTCF proteins bound in a convergent orientation (Rao et al. 2014).
A way to identify different types of chromatin is via highthroughput analysis of histone modification data and hidden Markov modeling (HMM) (Ernst et al. 2011;Carrillo-de-Santa-Pau et al. 2017); this approach classifies chromosome regions into a number of states (e.g., promoter/enhancer associated, polycomb repressed, Healthy cells. For CCND1, we show the approximate positions and orientations of four genes within the locus (TPCN2 lies just outside of the TAD which encloses the other three). For IGH, we show four previously identified enhancer elements within the constant region. (B) A rearrangement which has been observed in multiple myeloma (and in the U266 cell line) where an IGH insert bringing an enhancer element into the vicinity of CCND1 is shown. (C ) A t(11;14) translocation often found in mantle cell lymphoma (and in the Z-138 cell line) which brings CCND1 next to the entire IGH constant region is shown. and heterochromatin states) (see Supplemental Table S1). In the model, we had two different types of chromatin structure (the heteromorphic polymer): more open regions (thinner, more flexible polymer) and more compact regions (thicker, "crumpled" polymer) (see Supplemental Fig. S1 for a schematic). We used the chromatin states to identify these regions: H3K27ac associated states have the more open structure (Risca et al. 2017). The model included three different species of bridge-forming protein: a general "active binder," and two types of repressive binders (this is an extension of the scheme in Buckle et al. [2018] which only included active proteins). To identify active protein binding sites, we used DNA accessibility measured via DNase-seq experiments, assuming that DNase-hypersensitive sites (DHSs) are binding sites (see Methods; Supplemental Methods). Repressive protein binding sites were identified using the chromatin states, with one species binding to H3K9me3 states, and one species representing polycomb repressive complexes and binding to H3K27me3 states. Finally, ChIP-seq for CTCF was used to identify "anchor" sites where loop extrusion is halted. Full details of the simulations are given in Supplemental Methods, and a schematic is shown in Supplemental Figure S1.
To verify that HiP-HoP gives good predictions for CCND1, we generated simulated Hi-C interaction maps and compared them with publicly available data ( Fig. 2A; Rao et al. 2014;Supplemental Fig. S1 shows a similar plot but for the entire 3-Mbp region which was simulated, along with the experimental data used as an input). The simulations gave a good prediction of the TAD pattern across the region (clear domains are visible in Fig. 2A; we used the TAD-calling algorithm from the HiCExplorer software [Ramírez et al. 2018] to identify boundaries; see Supplemental Fig. S2A,B; Supplemental Methods Section 4). CCND1 sits at the far right of a TAD which is bounded by convergent CTCF sites, starts just to the left of TPCN2, and ends between LTO1 and FGF19. We shall refer to the three neighboring domains as the TPCN2-, CCND1-, and FGF19-TADs. The gene MYEOV sits just left of the center of the CCND1-TAD. More broadly, the dependence of interactions with genomic separation was correctly captured by the model (Supplemental Fig.  S2C), and we found good correlation with experiments using several different metrics (e.g., a Pearson's correlation of 0.7 was observed for the directionality index profile) (see Supplemental Simulated Hi-C maps tend to show more structure within the domains than experiments (visible as dark spots and stripes; this was also observed in previous HiP-HoP studies). Typically, these features arise in the model due to protein-mediated enhancer-promoter interactions. It could be that these interactions are overesti-mated in the model, for example, because parameters such as the number of bridging proteins are not correct, or because such interactions are disrupted by a mechanism which is not present in the HiP-HoP framework. Another possibility is that the experimental data lacks sufficient resolution to reveal these features. Many of the predicted 'within-TAD' interactions are present in publicly available HiChIP and ChIA-PET data which have higher resolution (Supplemental Fig. S2G) (promoter-enhancer interactions are also more apparent in targeted 4C [Zhao et al. 2006], nucleosome resolution MicroC [Krietenstein et al. 2020], and recent Hi-C data which have been treated using a method for removing biases [Lu et al. 2020]). A quantitative analysis of within-domain loops found in HiChIP data shows that the simulations correctly predict an enrichment of these interactions (Supplemental Fig. S2G; see Supplemental Methods Section 4). In the simulated Hi-C, several smaller 'subTADs' are also visible; for example, one subTAD is generated by looping interactions between a CTCF site just upstream of MYEOV and a CTCF within the CCND1 gene body [green dashed lines in Fig. 2A(ii)]. A typical simulated structure is shown in Figure  2B; also shown are five other representative structures (from a population of 4400 generated).  A more focused view of chromatin interactions at gene promoters is shown in simulated 4C interaction profiles ( Fig. 2C; we take a 1-kbp region immediately upstream of a gene's transcription start site [TSS] to be its promoter). These reveal that the CCND1 promoter often interacts with polycomb-associated regions (arrow 'i' in the figure); this is because there is enrichment of H3K27me3 at the promoter. There are also 'active' H3K4me1/3 modifications, which could indicate variation between alleles or across the population, with the gene sometimes expressed and sometimes repressed. Consistent with this, publicly available RNA-seq data (The ENCODE Project Consortium 2012) show weak expression of CCND1 in GM12878 cells (see Supplemental Methods).
To more quantitatively assess the expression of four of the genes within the locus, we performed qPCR experiments (see Supplemental Methods; Supplemental Fig. S3). We found that, in these cells, TPCN2 and LTO1 are expressed at slightly lower levels than CCND1; in simulations their promoters interact with several nearby DHSs. MYEOV shows much lower expression than the other three genes (Supplemental Fig. S3A) and is covered by enhancer-associated H3K4me1 and H3K27ac states. Although there is no DHS at the MYEOV promoter to drive strong interactions, there is a general enrichment of interactions within a broad region downstream from the gene (arrow 'ii' in Fig. 2C), driven by the CTCF site immediately upstream of the gene (arrow 'iii'). There is a DHS just downstream from MYEOV (arrow 'iv') which interacts strongly with DHSs across the locus (including at several promoters) (see bottom track in Fig. 2C); this, together with the chromatin states, suggests that this region may have enhancer function.
A super-enhancer insertion near CCND1 drives changes in 3D structure within the TAD We next used HiP-HoP to study CCND1 in the MM cell line U266. These cells possess a genomic rearrangement where an enhancer region from IGH is inserted upstream of CCND1, within its TAD boundaries. A number of super-enhancers have previously been described in the nonvariable IGH region, including the long recognized Eα1, Eα2, and Eµ enhancers (Mills et al. 1983(Mills et al. , 1997Chen and Birshtein 1997). More recently, we proposed that a fourth element, Eδ, should be identified as distinct from Eµ, based on highquality chromatin state mapping (Mikulasova et al. 2022). The ∼100-kbp inserted region in U266 includes Eα1 (accurately mapped by Mikulasova et al. 2022). U266 is a useful model because it allows study of the effect of relocating a single IGH enhancer within an existing TAD. It leads to significant changes in expression which are similar to those resulting from a reciprocal translocation.
In order to simulate the 3D genome structure in U266 cells, we generated an alternative genome build which includes the insert, denoting this hg19_u266 (compared to the unaltered hg19 genome used for GM12878 above). We obtained U266 DNA accessibility and chromatin states from the BLUEPRINT project (Stunnenberg et al. 2016) and used CTCF binding site data for B cells (The ENCODE Project Consortium 2012). Example configurations are shown in Figure 3, A-C. Figure 4A shows simulated Hi-C results for U266 along with the input data; the insert region is marked in green. For comparison, Figure 4B shows the intact locus from the GM12878 simulations (no insert).
The overall TAD structure is unchanged by the presence of the insert; however, interactions within the domains show significant differences between the two cell lines. Particularly, in U266 there are several strongly interacting regions (dark spots in the Hi-C map, some highlighted with green circles), including close to CCND1 and within the IGH insert itself. Many of these are between DHSs, between regions with heterochromatin states, or between regions with polycomb states. DHSs often coincide with regulatory elements such as promoters and enhancers; particularly in U266 there is a cluster of DHSs near and within CCND1 which is not present in GM12878. Together, these observations suggest that loop extruders and CTCF sites drive TAD formation, whereas bridging proteins drive interactions within TADs. Extrusion is unchanged by the presence of the insert (which only contains one, low-occupancy CTCF site), whereas significant remodeling of chromatin in U266 compared to GM12878 leads to changes within the TAD. To validate this new TAD layout, we compared our simulated interaction map with Hi-C data for U266 cells obtained from Wu et al. (2017) (Fig. 4C). Although these data have much lower read coverage than the GM12878 data discussed above, it is sufficient to generate a map at 20-kbp resolution and call TADs (Supplemental Fig.  S4); however, there is a region with repetitive DNA sequence within the insert region to which short Hi-C reads cannot be mapped, and this leads to an incorrectly called TAD boundary. Nevertheless, the rest of the TADs in the data match those called from the simulated map, and there is good visual agreement (Supplemental Fig.  S4; see Supplemental Methods Section 4 for details of the analysis). It is clear that CCND1 is indeed within the domain which encompasses the IGH insert, and its promoter shows enriched upstream interactions, validating the simulation predictions. Broader comparison measures also showed significant correlations between the simulation and experimental Hi-C, though the agreement was reduced compared to the GM12878 case (likely due to the lower resolution of the data) (Supplemental Fig. S4; Supplemental Methods Section 4). Figure 4D shows simulated 4C with viewpoints positioned at gene promoters (top four rows). We found that there were new (compared to GM12878) strong interactions between the CCND1 promoter and several DHSs across the TAD, including three prominent peaks within the insert (arrow 'i' in Fig. 4D), several peaks between MYEOV and the insert (arrows 'ii'; some of these DHSs were not present in GM12878), and within and around the nearby genes (arrows 'iii', 'iv', and 'v'). were observed as expected (Fig. 4D, bottom four rows). Many of the regions around the new DHSs also gained enhancer or promoter chromatin states compared to GM12878. The new interactions arise because CCND1 gains DHSs and active promoter and enhancer chromatin states at the promoter and across its body (while losing H3K4me1 and H3K27me3 at the promoter). These features are consistent with H3K4me3 broad domains, which have previously been associated with super-enhancers (Thibodeau et al. 2017;Dhar et al. 2018). Their appearance has been implicated in cancer-specific super-enhancer hijacking at a number of oncogenes (including CCND1) (Mikulasova et al. 2022).
The promoter of MYEOV also gains a DHS in U266 compared to GM12878, and the chromatin state changes such that H3K27ac and H3K4me3 now cover the gene, extending several kbp downstream (another broad domain). It also shows a number of new interaction peaks, particularly in a region downstream where there is a cluster of new DHSs with active enhancer and promoter chromatin state (arrows 'vi' in Fig. 4D). The MYEOV broad domain region as a whole interacts with the CCND1 broad domain in ∼3% of configurations, compared to ∼2% in GM12878. MYEOV shows more frequent interactions with the CCND1 promoter than with the IGH insert, despite the latter being closer genomically (arrows 'vii'). Taken together, these observations suggest that this new MYEOV broad domain is active as an enhancer (as noted above, the MYEOV proximal region also displayed some enhancer-like features in the healthy cell line).
HiP-HoP generates a population of structures (representing a population of cells) and provides full three-dimensional details. We can also therefore measure, for example, separations between specific points or the overall 3D size of a given region. We find that the mean separation of TPCN2 and CCND1 tends to be larger in U266 where the insert is present. This is expected because not only is the genomic separation (along the chromosome) of the probes larger, but in the model, more of the region has the open (H3K27ac-associated) structure. The latter effect is highlighted by examining the 3D size of a region of fixed genomic length around  The reciprocal translocation t(11;14)(q13;q32) generates an oncogenic TAD fusion The MCL cell line Z-138 harbors a translocation that dramatically changes the local environment of CCND1: it relocates to Chromosome 14 where it becomes juxtaposed with the joining region of the IGH locus. Although the t(11;14) rearrangement is common, the breakpoint has not previously been mapped in this cell line. Using paired-end read targeted DNA sequencing, we precisely mapped the chromosomal changes (Supplemental Fig. S6). With this information, we could then perform simulations of the rearranged genome, which we denote hg19_z138. Figure 5 shows results from simulations using Z-138 epigenomic data (again obtained from BLUEPRINT [Stunnenberg et al. 2016] and ENCODE [The ENCODE Project Consortium 2012]). The simulated Hi-C map (Fig. 5A) predicts that a new TAD will form, encompassing CCND1, LTO1, and the IGH super-enhancers (Eα2, Eα1, Eµ, and Eδ). The right-hand boundary of the TAD is formed by a pair of divergent CTCF sites and is conserved from the Chromosome 11 boundary observed in GM12878 and other cell types. The left-hand boundary is formed by three inward pointing CTCF sites; a boundary is also seen at this position on Chromosome 14 in some other cell types which do not have a translocation (including human umbilical vein endothelial cells) (Rao et al. 2014). In summary, the simulations predict that these two boundaries are conserved after the translocation, and this leads to formation of a new IGH-CCND1 TAD. This new TAD has a length of about 500 kbp, about 20% smaller than the ∼640-kbp CCND1-TAD observed in GM12878 cells (the 3D volume of the TAD in Z-138 is also correspondingly smaller). One can also see a region within the TAD which shows stronger enrichment of iterations (a 'subTAD' of length ∼407 kbp) (Fig. 5B); this arises due to looping between a CTCF site just upstream of Eα2 and a CTCF within CCND1. The latter is the same CTCF site which forms a subTAD in GM12878. Figure 5C shows simulated 4C with viewpoints at CCND1 and several positions within the translocated IGH super-enhancers. CCND1 shows prominent interaction peaks in both Eδ and Eµ (arrows 'i' and 'ii' in the figure; the interaction is strongest where several DHSs are clustered together). There are also smaller interaction peaks within Eα1 and Eα2 and at a number of other DHSs (including a cluster to the left of Eα2, arrow 'iii'). Similar to U266, there are a number of DHSs at the CCND1 promoter and within the gene body which are not present in the GM12878 cell line. In the simulations, the interactions of the promoter are driven by binding (and molecular bridging) of active proteins; this bridging is likely further promoted by CTCF/cohesion-driven loop extrusion. Viewpoints positioned at DHSs within the enhancers show reciprocal interactions with CCND1.
Here, we have simulated the breakpoint and rearrangement as mapped for the Z-138 cell line (about 60 kbp upstream of CCND1). It is also possible to simulate alternative rearrangements to better understand the effect of breakpoint variation between patients. Up to 50% of t(11;14) MCL breakpoints are within the so-called 'major translocation cluster' about 120 kbp upstream of CCND1 (Jares and Campo 2008), with the rest scattered throughout a broader 380-kbp region around this. Supplemental Figure S7 shows simulated 4C from two different examples where the Chromosome 14 breakpoint is kept the same but we move the breakpoint on Chromosome 11 to different positions between 11 and 100 kbp upstream of CCND1. We found that, in general, interactions between the gene and the Eµ enhancer are weaker if their genomic separation is larger.

Chromatin remodeling and CCND1 3D structures differ in U266 and Z-138
The changes to chromatin states and DNA accessibility at CCND1 differ subtly between the two cancer cell lines. This is visible in Figures 4A and 5A but is more evident if the data are mapped back to the hg19 reference genome (Supplemental Fig. S8 shows the chromatin states and DHSs at three regions within the CCND1 locus where there is significant remodeling compared to GM12878). In both U266 and Z-138, the CCND1 gene body gains several DHSs and an H3K4me3 broad domain appears. From 10 distinct DHSs within CCND1, five are common to both cancer cell lines, three are specific to Z-138, and two are specific to U266. In other words, the different rearrangements lead to similar, but not identical, changes to the chromatin structure at CCND1. In U266 cells, two other regions within the TAD also show substantial changes: a region covering and extending downstream from MYEOV and a region between MYEOV and CCND1 (location of arrows 'ii' and 'vi' in Fig. 4D) both gain an H3K4me3 broad domain and several new DHSs. We note that the pattern of chromatin states and accessibility at the IGH enhancers also varies between the U266 and Z-138 cell lines (see Supplemental Fig. S9).
The changes to the chromatin states at CCND1 have a striking effect on the 3D structure of the gene: In Figure 6A, we plot the distribution of the simulated 3D size of the gene body (quantified by the radius of gyration, R g ) (see Supplemental Methods). In GM12878, the gene is on average more compact than in the other cell lines. The snapshots in Figure 6A show typical configurations: The differences in the size of the gene, the chromatin states, and the DHS pattern are clear. In GM12878, the gene has a crumpled structure, whereas in U266 it is more stretched out. Protein-mediated loops can form between the DHSs within the gene body which reduce its 3D size: Variation in the number of such loops present at any one time leads to the large variation in R g . The snapshot for the Z-138 case shows a configuration where a loop forms between a DHS within the gene body and a DHS in the upstream region. Because there are more DHSs across the gene in Z-138, there are more possibilities for loops to form: smaller R g values are therefore more likely and there is less variation.
To see how this large change in 3D structure correlates with gene expression, we performed additional qPCR experiments comparing the expression of the four genes in the locus between the cell lines. Figure 6B shows their fold change in expression in U266 compared to GM12878 (see also Supplemental Fig. S3B). Whereas all of the genes show an increase, that of CCND1 is by far the largest: more than 500-fold. The next largest change is for LTO1 which shows an ∼9.8-fold increase. We considered whether these changes could be predicted from the simulated structures. In our simulations, the active proteins represent complexes of transcription factors and polymerases, and one might therefore associate binding of these to a gene promoter site with transcription. Indeed, in previous work using a simpler model, we showed that the fraction of time a promoter is bound by an active protein during the simulation is correlated with gene expression measured experimentally via GRO-seq (Brackley et al. 2021). In the simulations, the active proteins can bind at promoters which overlap a DHS; they can also simultaneously bind multiple DHSs to form molecular bridges. An emerging feature of the model is that clusters of proteins form at positions where there are bridges between DHSs, and that in most cases where a promoter is bound by a protein, it belongs to one of these clusters. This means that the likelihood that a promoter is bound by a protein depends on the local 3D structure, consistent with a mechanism where enhancer-promoter interactions and/or formation of transcriptional protein condensates drive expression.
Here, we calculated the fraction of simulated configurations where a given promoter is bound by a protein. This measure does not take into account enhancer 'strength' or compatibility and can only have a value between 0 and 1; therefore, we only expect a correlation and not a quantitative mapping to expression. Nevertheless, the predictions are largely consistent with the qPCR data, even if this is not a direct validation of the predicted structures (see Supplemental Methods and Supplemental Fig. S3Ei for further details). An even better correlation between simulations and data is obtained if, rather than considering protein binding, we instead ask how often the promoter interacts with (forms loops to) chromatin which has an enhancer-associated state (Fig. 6C). We note that there are simpler models which aim to predict expression, for example, based on histone modification levels near gene promoters (Fulco et al. 2019), but these do not provide the same insight and context revealed by the predicted 3D structures. In Z-138 cells, CCND1 expression shows an even larger 1200fold increase compared to GM12878 (qPCR data shown in Supplemental Fig. S3C); LTO1 and TPCN2 also show increased expression in Z-138 compared to GM12878 cells, but we did not detect MYEOV expression. Only CCND1 and LTO1 were present within the region simulated, and simulations correctly predicted the former would display the greater fold change compared to GM12878 cells (Supplemental Fig. S3Eii).
To verify that the change in 3D gene size is observed in vivo, we performed fluorescence in situ hybridization (FISH) microscopy using fosmid probes ( Fig. 6D; see also Methods and Supplemental Methods). We note that the probes cover a region several times larger than the CCND1 gene body (42.8 kbp compared to the 13.4-kbp gene), so this is an indirect measure. Figure 6E shows a typical image from the Z-138 cells. First, we note that the positioning of the spots is consistent with the translocation. One of each of the red, green, and gold spots are together in a group (on the nontranslocated Chromosome 11); the other gold (CCND1) probe is alone, whereas the other red and green probes (TPCN2 and MYEOV, respectively) are together. From the images, we obtain the 3D volumes of the probes; a similar measure is obtained from the simulations. The simulations predict that compared to the GM12878 cell line, the volume occupied by the CCND1 probe region will be, on average, 30% larger in U266 cells and on average, 16% larger in Z-138 cells. They also predict that the MYEOV probe volume will be, on average, 36% larger in U266 cells compared to GM12878, whereas the TPCN2 probe will be the same volume in both of these cell lines. Experimentally, we obtained measurements from at least 20 cells for each cell line (with two spot measurements in each; the numbers of cells where spots were clearly visible was limited by some technical aspects) (see Supplemental Methods). The data showed trends which were generally consistent with the simulations: the CCND1 probe volume was, on average, 30% larger in U266 compared to GM12878, and the MYEOV probe was, on average, 23% larger. However, in the experiments the differences were not statistically significant; nevertheless, the simulations are still consistent with this, as taking a random subset of the simulation measurements such that there were the same number as in the experiments also did not show any significant differences between the probe volumes in the different cell lines (see Supplemental Methods Section 7 for further details). Plots showing the distributions of the CCND1 probe volumes for experiments and simulations are shown in Figure 6, F and G (Supplemental Fig. S10 shows similar plots for the other probes). We note that although the simulations are consistent with the in vivo trends, we would not expect the model to be able to predict quantitative differences in probe volumes; this is consistent with previous work where predictions for 3C-based data (interactions) were better than those for microscopy data (Buckle et al. 2018).
The oncogene structure of CCND1 is driven by chromatin remodeling We have observed that in these cell lines, the genome rearrangement is accompanied by extensive chromatin remodeling. In particular, in both U266 and Z-138, an H3K4me3 broad domain containing several DHSs appears over CCND1. A likely scenario is that after the rearrangement, the IGH super-enhancers recruit transcription factors, chromatin remodelers, and architectural proteins to the region. The resulting changes then in turn disrupt the wider 3D structure, leading to dysregulation of CCND1. The HiP-HoP computational framework provides a unique opportunity to examine these two effects in isolation: we can rearrange the genome without otherwise changing the chromatin states, or we can use chromatin states from a cancer cell line without including any genomic rearrangement. It is not possible to do this in vivo, and these in silico situations cannot be engineered in reality. Nevertheless, they provide insight. Figure 7A shows results from a simulation where chromatin state and DHS data for GM12878 cells were used as input, but the hg19_u266 genomic insert rearrangement was included (as depicted in the cartoon). In Figure 7C, we compare 4C results from GM12878 simulations with and without the insert; this shows that adding the insert has only a small effect on the interaction profile of the gene promoters. The profiles for TPCN2 and MYEOV are unchanged. Also, despite the insert containing a DHS within an H3K27ac region, there is little interaction with the nearby CCND1 promoter (which has a polycomb chromatin state). On the other hand, there is some interaction between the insert and LTO1. Using the simulations to predict gene expression as above, these results suggest that without any subsequent chromatin remodeling, the presence of the insert alone would not lead to CCND1 up-regulation. This is in contrast to the measured change in RNA levels via qPCR (Supplemental Fig. S3B,C), which show ∼500-fold and 1200-fold increases in CCND1 expression in U266 and Z-138, respectively, compared to GM12878. Figure 7, B and D, shows similar results from simulations where U266 epigenomic input data is used with the intact hg19 genome (no insert, as shown in the cartoon). Here we see the effect of the chromatin remodeling in the absence of the insert. We find that in this case, the CCND1 promoter still shows interaction peaks across the CCND1 gene body, at regions downstream (in and around LTO1) and to a lesser extent, the H3K4me3 broad domain regions upstream. Beyond the loss of interactions with the insert itself, the main effect of removing the insert is an increase in interactions between CCND1 and upstream DHSs, consistent with their reduced genomic separation. The simulations also predict that very little change in gene expression level would result from removing the insert while keeping the U266 chromatin states. Together these results suggest that it is chromatin remodeling which drives the changes in the 3D structure of the locus in terms of promoter-enhancer interactions, and this in turn drives up-regulation of cyclin D1 (Fig. 7E).
As well as simulating different rearranged genomes, it is also possible to directly edit the input data; that is, we can edit the epigenome in silico. For example, we found little change in CCND1 interactions if we take chromatin states from GM12878 cells for Chromosome 11 and add an insert from Chromosome 14 with states from the U266 cells (Supplemental Fig. S11A,B). In a simulation of the U266 cells with the insert where we 'switched off' protein binding at four DHSs within the promoter and gene body of CCND1, there was a loss of CCND1 interactions compared to the previous U266 simulations (Supplemental Fig. S11C-E).

Discussion
In this work, we have adapted the HiP-HoP simulation model to study chromatin 3D structure and the effect of genome rearrange-ments in malignant and nonmalignant B cells. HiP-HoP simulations predict 3D structures from DNA accessibility, chromatin states, and CTCF binding data. We used these structures to simulate Hi-C and 4C data and to make single cell-like measurements. Crucially, Hi-C data are not used as an input, so in this sense the model is truly predictive for 3D structures. Here, by "rearranging" the input data, we generated predictions for the effect of genome rearrangements which are found in MCL and MM cell lines.
We first confirmed that the HiP-HoP model gives good predictions for the CCND1 gene locus by performing simulations of the healthy B cell-derived GM12878 lymphoblastoid cell line. The TAD pattern observed in Hi-C (Rao et al. 2014 reproduced by the simulations, with boundaries at CTCF sites. In these cells, CCND1 is only expressed at very low levels; the promoter region has a polycomb-associated chromatin state, and our simulations predicted interactions between the promoter and other polycomb regions. There are several parameters in our model, and varying these could affect the quality of the predictions; however, our previous work (Buckle et al. 2018) showed that variation of most parameters has a small effect on the predicted interaction, and this is mostly quantitative rather than qualitative (e.g., increasing the number of simulated proteins tends to increase chromatin interactions, but without changing the positions of peaks). Our parameter choice here was based on those in Buckle et al. (2018) (though due to the different setup, we could not preserve all parameters) (see Supplemental Methods), and it may be that predictions could be improved by optimizing the parameters for this specific locus. Further simulations predicted that the TAD structure around CCND1 is preserved in the U266 MM cell line which possesses a rearrangement where a super-enhancer from the IGH locus is inserted upstream of CCND1. This is consistent with previously published low-resolution Hi-C data in that cell line (Wu et al. 2017). In these cells, the chromatin around CCND1 is remodeled: Several new DHSs are established, and a broad region gains an active promoter chromatin state (H3K4me3 and H3K27ac). The simulations predicted that the CCND1 promoter interacts with these new DHSs in the gene body, as well as with a number of DHSs within the inserted IGH region and the neighboring gene LTO1 (interactions which are not present in GM12878 cells).
We also performed simulations of the MCL cell line Z-138, where CCND1 translocates with Chromosome 14 becoming juxtaposed with the IGH Eµ enhancer. In this cell line, a TAD boundary formed by a cluster of CTCF binding sites downstream from CCND1 was preserved after the translocation, as was a boundary to the left of the IGH super-enhancers (presumably in healthy cells this isolates the super-enhancers from other nearby genes). In other words, the simulations predict a fusion of the CCND1 and IGH regions into a new TAD-an oncogenic TAD fusion (Valton and Dekker 2016). In this new arrangement, the promoters of CCND1 and LTO1 readily interact with each other and with the proximate Eµ and Eδ super-enhancers, whereas weaker interactions were observed with the more distant Eα1 and Eα2.
In vivo, genomic rearrangements are accompanied by chromatin remodeling; within our simulation scheme, it was also possible to examine the effect of a rearrangement in the absence of remodeling. Simulations of GM12878 cells with the Eα1 super-enhancer DNA inserted in silico upstream of CCND1 (with GM12878 chromatin states otherwise unchanged) showed only very minor changes in terms of 3D structure compared with the unaltered genome. We also performed a simulation using U266 chromatin states but without the insert (i.e., the epigenomic rearrangement is included, but not the genomic rearrangement, an experiment that is only possible to do in silico). This showed little change compared to the U266 case with the insert. Together, this suggests that it is the local chromatin remodeling, rather than the proximity of CCND1 to the IGH enhancers per se, which drives gene deregulation. Or, in other words, a genomic translocation leads to an epigenomic translocation, which drives cyclin D1 overexpression. The remodeling includes the appearance of an H3K4me3 broad domain over CCND1; the results here support our previous work suggesting that such broad domains are associated with hijacked super-enhancers (Mikulasova et al. 2022). In U266, two other nearby regions become H3K4me3 broad domains, one covering the gene MYEOV (this region also appears to have some enhancerlike properties in the healthy cell line). The chromosomal locations which are predicted to strongly interact with CCND1 could be used for targeting in future experiments which aim to uncover the mechanisms through which broad domains are generated. For example, one could ask whether using CRISPR-dCas9 to tether KRAB or another enzyme at the U266 insert is sufficient to reverse the nearby epigenetic changes.
In principle, our method can be applied to simulate any genome rearrangement for which input data is available. One current limitation is that the input data comes from both alleles of the locus; so, for example in the Z-138 case, the chromatin states are based on a combination of histone modifications from the translocated and non-translocated CCND1 loci. We note that this is a limitation of the input data rather than the simulation scheme; if the full sequence of a given cell line or patient sample were available, ChIP-seq reads could be aligned in an allele-specific way. It may also be possible to use the simulations to explore hypotheses about differences between alleles; for example, if one were to assume one copy of the locus has active chromatin states and the other inactive states, how does the 3D structure differ? A simplifying assumption in the model is that all DHSs are binding sites for a general active factor, and it would be interesting to incorporate more detail in the future, for example, if transcription factors specific to a particular oncogene are known (Brackley et al. 2016a). Other possible improvements to the model include adding further "compaction levels" in the heteromorphic polymer (which might improve microscopy predictions) or adding protein-protein interactions promoting larger liquid-like phase separated protein droplet formation (augmenting the small protein clusters which form via bridging-induced attraction in the present model) (Brackley et al. 2013). There are also some more general limitations to our approach. We have focused here on translocations which reposition super-enhancers; it is unclear what insight could be gained in other situations (e.g., gene fusions, or frameshift mutations). We have not yet considered copy number variations and gene dosage and have assumed that we are simulating an initial driver rearrangement; using the approach to understand the evolution of genomic abnormalities which arise sequentially would be more challenging.
It would be of interest in the future to also simulate the dynamics of histone modification and DNA accessibility; as noted above, our simulations imply that the genomic rearrangement and repositioning of super-enhancers drives local changes to these chromatin features. For example, including histone modifying enzymes explicitly in a model such as HiP-HoP would open the possibility to understand this process and to predict how a rearrangement changes the 1D chromatin properties as well as the 3D structure. In the long term, it may then be possible to predict changes to expression using only input data from healthy cells, plus the coordinates of the rearrangement.
In summary, our work strongly suggests that genome rearrangements drive a subsequent epigenomic rearrangement, which in turn leads to deregulation and proto-oncogene activation. We have demonstrated that polymer physics-based modeling can be useful for understanding the structural consequences of genome rearrangements and can help to focus future experimental work. It would be interesting to see if in the future, such simulations could also help us understand the mechanisms behind the epigenomic rearrangement. This would clearly be important for the development of any therapies which seek to interfere with that process.

HiP-HoP simulations
We used the HiP-HoP model as detailed in Buckle et al. (2018); full details are given in Supplemental Methods. In brief, chromatin is represented as a chain of beads (each representing a 1-kbp region), and we evolve the configuration of this chain using a molecular dynamics scheme and the LAMMPS simulation software (Plimpton 1995). To improve efficiency, in each simulation we include 40 Mbp of chromatin (40,000 beads) which includes 11 copies of the region of interest; for each region we perform two such simulations to generate 4400 individual configurations, each of which can be said to represent a single cell. The chromatin concentration is roughly matched to that of a typical nucleus. From the configurations we generate simulated Hi-C and 4C and single cell-like measurements. As detailed in the text, three different mechanisms are included to drive the chain into specific structures: diffusing bridge-forming proteins, loop extrusion, and a heteromorphic polymer structure. DNase hypersensitive sites are used to identify binding sites for an active protein (we use the simplifying assumption that all DHSs are the same and bind this protein, which represents a general complex of polymerase and transcription factors). We use chromatin state data to identify binding sites for two species of repressive protein (e.g., representing HP1 and polycomb repressive complexes) and to identify regions which have an open chromatin structure. ChIP-seq data for CTCF is used to identify direction-dependent anchor sites for loop extrusion. Full details of the input data treatment are given in the Supplemental Methods. Details of publicly available data used for simulation input and validation are given in Supplemental  Table S3 and Supplemental Methods. Note that the hg19 reference genome was used in this study, as some data sets were only available mapped to this reference; because our simulations do not depend on the precise underlying sequence, none of our conclusions would be altered by using a more recent genome build, such as GRCh38.

Quantification of gene expression
For gene expression measurements, total RNA was extracted using an RNeasy kit (QIAGEN) from between 5 × 10 6 and 10 × 10 6 Z-138, U266, and GM12878 cells. For qPCR, cDNA was synthesized from 1 µg of total RNA using random primers (Promega), then treated with M-MLV Reverse Transcriptase (Promega). Two micrograms of cDNA reaction were amplified with SYBR green master mix (Invitrogen), 10 µM forward and reverse primers for LTO1, TPCN2, CCND1, and MYEOV (Sigma-Aldrich) with 40 cycles of PCR (95°C for 15 sec, 60°C for 60 sec) after initial denaturation (95°C for 10 min). Cycle threshold values were normalized by comparison with GAPDH expression. See Supplemental Methods for further details (oligonucleotide sequences are shown in Supplemental Table S2).

Genome rearrangement breakpoint mapping
Breakpoint mapping in U266 and Z-138 cells was performed as described previously (Mikulasova et al. 2020(Mikulasova et al. , 2022; further details are given in Supplemental Methods.

Data access
All raw and processed sequencing data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi .nlm.nih.gov/bioproject) under accession number PRJNA635269. The FISH and simulation data generated in this study have been submitted to the Edinburgh DataShare repository (https:// datashare.ed.ac.uk/handle/10283/4433). See also Supplemental Table S4.