Design optimization methods for genomic DNA tiling arrays

Table 3.

Comparison of two simple tiling metrics that incorporate repetitive nucleotides to improve non-repetitive sequence coverage





Case 1: Threshold repeat inclusion (50 bp)

Case 2: Percentage repeat inclusion (25%)
Organism
Genome size
Percent repeats
Percent non-repeat bp covered
Percent repeat bp included vs all non-repeat bp
Tile quality
Percent non-repeat bp covered
Percent repeat bp included vs all non-repeat bp
Tile quality
Pan troglodytes 3,083,993,401 57.74 64.85 4.15 62.04 66.85 17.94 52.24
Homo sapiens 3,070,537,687 52.38 65.01 4.09 62.24 67.11 18.22 52.16
Rattus norvegicus 2,795,745,218 48.75 66.66 4.28 63.68 69.42 19.84 52.24
Mus musculus 2,638,213,512 45.62 77.56 4.30 74.07 80.82 20.15 60.43
Caenorhabditis elegans 100,277,879 11.26 89.71 2.18 96.68 99.84 11.12 87.47
Drosophila melanogaster 129,323,838 14.23 97.63 0.03 99.97 100 2.39 97.55
Fugu rubripes 349,519,338 15.06 95.09 1.86 97.74 100 6.33 93.24
Arabidopsis thaliana
119,186,497
0.16
99.51
1.29
98.22
100
13.29
84.68
  • In Case 1, repeat sequences ≤50 bp were allowed, and in Case 2 up to 25% of a tile may contain repetitive nucleotides. As in Table 1, tile sizes range from 300 bp to 1.5 kb. Case 1 achieves only marginal improvement in non-repetitive sequence coverage when compared with the same level of repeat nucleotide inclusion in the optimal tiling case. Non-repetitive sequence coverage in mammalian genomes falls sharply in Case 2 despite the inclusion of a high percentage of repetitive DNA. In each case, performance on mammalian genomes is significantly lower than that of the optimal tiling algorithm (Table 2).

This Article

  1. Genome Res. 16: 271-281

Preprint Server