Design optimization methods for genomic DNA tiling arrays

Table 2.

Optimal and naive tiling of various sequenced genomes for tile sizes between 300 bp and 1.5 kb






Optimal sequence tiling


Naive partitioning
Comparison
Organism
Genome size
Percent repeats
Tile quality
Percent non-repeat bp covered
Percent repeat bp included vs all non-repeat bp
Tile quality
Percent improvement
Pan troglodytes 3,083,993,401 57.74 66.05 89.81 4.23 85.58 19.53
Homo sapiens 3,070,537,687 52.38 66.07 89.60 4.06 85.53 19.47
Rattus norvegicus 2,795,745,218 48.75 66.86 91.43 5.54 85.89 19.03
Mus musculus 2,638,213,512 45.62 66.18 91.09 5.51 85.58 19.41
Caenorhabditis elegans 100,277,879 11.26 84.29 98.54 3.10 95.44 11.16
Drosophila melanogaster 129,323,838 14.23 86.89 99.40 2.62 96.78 9.89
Fugu rubripes 349,519,338 15.06 87.97 99.07 2.13 96.94 8.97
Arabidopsis thaliana
119,186,497
0.16
99.97
100.00
0.00
100.00
0.02
  • Repetitive elements were identified using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Tandem Repeats Finder (Benson 1999). The genome sequences vary in the degree of repeat density, ranging from mammalian genomes with nearly 50% repeat content to the relatively repeat-free Arabidopsis genome. Obtaining a high degree of non-repetitive sequence coverage for the genomes on the latter end of the spectrum is straightforward. However, as higher eukaryotes are considered it becomes impossible to optimally tile the highly repetitive sequences without further processing.

This Article

  1. Genome Res. 16: 271-281

Preprint Server