Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties

  1. Albino Bacolla1,5,
  2. Jacquelynn E Larson1,
  3. Jack R Collins2,
  4. Jian Li3,
  5. Aleksandar Milosavljevic3,
  6. Peter D Stenson4,
  7. David N Cooper4, and
  8. Robert D Wells1
  1. 1 Institute of Biosciences and Technology;
  2. 2 National Cancer Institute/SAIC;
  3. 3 Baylor College of Medicine;
  4. 4 Cardiff University - UK

Abstract

Microsatellites are abundant in vertebrate genomes but their sequence representation and length distributions vary greatly within each family of repeats (e.g. tetranucleotides, etc.). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra- and tri-nucleotide repeats revealed an inverse correlation between the stability of folded-back hairpin and quadruplex structures and the sequence representation for repeats ≥30 bp in length in 9 vertebrate genomes. Alternatively, the predicted energies of base stacking interactions correlated directly with the longest length distributions in vertebrate genomes. Genome-wide analyses indicated that unstable sequences, such as CAG:CTG and CCG:CGG, were overrepresented in coding regions and that micro/minisatellites were recruited in genes involved in transcription and signaling pathways, particularly in the nervous system. Microsatellite instability (MSI) is a hallmark of cancer and length polymorphism within genes can confer susceptibility to inherited disease. Sequences that manifest the highest MSI values also displayed the strongest base stacking interactions; analyses of 62 tri- and tetra-nucleotide repeat-containing genes associated with genetic disease revealed enrichments similar to those noted for micro/minisatellite-containing genes. We conclude that DNA structure and base stacking determined the number and length distributions of microsatellite repeats in vertebrate genomes over evolutionary time and that micro/minisatellites have been recruited to participate in both gene and protein function.

Footnotes

    • Received March 10, 2008.
    • Accepted July 31, 2008.
ACCEPTED MANUSCRIPT

This Article

  1. Genome Res. gr.078303.108 Copyright © 2008, Cold Spring Harbor Laboratory Press

Article Category

Share

Preprint Server