TY - JOUR A1 - Gur-Arie, Riva A1 - Cohen, Cyril J. A1 - Eitan, Yuval A1 - Shelef, Leora A1 - Hallerman, Eric M. A1 - Kashi, Yechezkel T1 - Simple Sequence Repeats in Escherichia coli: Abundance, Distribution, Composition, and Polymorphism Y1 - 2000/01/01 JF - Genome Research JO - Genome Research SP - 62 EP - 71 DO - 10.1101/gr.10.1.62 VL - 10 IS - 1 UR - http://genome.cshlp.org/content/10/1/62.abstract N2 - Computer-based genome-wide screening of the DNA sequence ofEscherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. colistrains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbersAF209020–209030 and AF209508–209518.] ER -