A genome-wide analysis of common fragile sites: What features determine chromosomal instability in the human genome?
- Arkarachai Fungtammasan1,2,3,
- Erin Walsh3,4,
- Francesca Chiaromonte3,5,7,8,
- Kristin A. Eckert3,6,7,8 and
- Kateryna D. Makova2,3,7,8
- 1The Integrative Biosciences Graduate Program, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- 2Department of Biology, Pennsylvania State University, University Park, Pennsylvania, 16802, USA;
- 3Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- 4Cellular and Molecular Biology Graduate Program, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA;
- 5Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- 6Department of Pathology, Jake Gittlen Cancer Research Foundation, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
-
↵7 These authors contributed equally to this work.
Abstract
Chromosomal common fragile sites (CFSs) are unstable genomic regions that break under replication stress and are involved in structural variation. They frequently are sites of chromosomal rearrangements in cancer and of viral integration. However, CFSs are undercharacterized at the molecular level and thus difficult to predict computationally. Newly available genome-wide profiling studies provide us with an unprecedented opportunity to associate CFSs with features of their local genomic contexts. Here, we contrasted the genomic landscape of cytogenetically defined aphidicolin-induced CFSs (aCFSs) to that of nonfragile sites, using multiple logistic regression. We also analyzed aCFS breakage frequencies as a function of their genomic landscape, using standard multiple regression. We show that local genomic features are effective predictors both of regions harboring aCFSs (explaining ∼77% of the deviance in logistic regression models) and of aCFS breakage frequencies (explaining ∼45% of the variance in standard regression models). In our optimal models (having highest explanatory power), aCFSs are predominantly located in G-negative chromosomal bands and away from centromeres, are enriched in Alu repeats, and have high DNA flexibility. In alternative models, CpG island density, transcription start site density, H3K4me1 coverage, and mononucleotide microsatellite coverage are significant predictors. Also, aCFSs have high fragility when colocated with evolutionarily conserved chromosomal breakpoints. Our models are predictive of the fragility of aCFSs mapped at a higher resolution. Importantly, the genomic features we identified here as significant predictors of fragility allow us to draw valuable inferences on the molecular mechanisms underlying aCFSs.
Footnotes
-
↵8 Corresponding authors.
E-mail chiaro{at}stat.psu.edu.
E-mail kae4{at}psu.edu.
E-mail kdm16{at}psu.edu.
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.134395.111.
- Received November 4, 2011.
- Accepted March 19, 2012.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.











