Comparison of two simple tiling metrics that incorporate repetitive nucleotides to improve non-repetitive sequence coverage
|
|
|
|
Case 1: Threshold repeat inclusion (50 bp) |
Case 2: Percentage repeat inclusion (25%) |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Organism |
Genome size |
Percent repeats |
Percent non-repeat bp covered |
Percent repeat bp included vs all non-repeat bp |
Tile quality |
Percent non-repeat bp covered |
Percent repeat bp included vs all non-repeat bp |
Tile quality |
||||
| Pan troglodytes | 3,083,993,401 | 57.74 | 64.85 | 4.15 | 62.04 | 66.85 | 17.94 | 52.24 | ||||
| Homo sapiens | 3,070,537,687 | 52.38 | 65.01 | 4.09 | 62.24 | 67.11 | 18.22 | 52.16 | ||||
| Rattus norvegicus | 2,795,745,218 | 48.75 | 66.66 | 4.28 | 63.68 | 69.42 | 19.84 | 52.24 | ||||
| Mus musculus | 2,638,213,512 | 45.62 | 77.56 | 4.30 | 74.07 | 80.82 | 20.15 | 60.43 | ||||
| Caenorhabditis elegans | 100,277,879 | 11.26 | 89.71 | 2.18 | 96.68 | 99.84 | 11.12 | 87.47 | ||||
| Drosophila melanogaster | 129,323,838 | 14.23 | 97.63 | 0.03 | 99.97 | 100 | 2.39 | 97.55 | ||||
| Fugu rubripes | 349,519,338 | 15.06 | 95.09 | 1.86 | 97.74 | 100 | 6.33 | 93.24 | ||||
| Arabidopsis thaliana
|
119,186,497
|
0.16
|
99.51
|
1.29
|
98.22
|
100
|
13.29
|
84.68
|
||||
-
In Case 1, repeat sequences ≤50 bp were allowed, and in Case 2 up to 25% of a tile may contain repetitive nucleotides. As in Table 1, tile sizes range from 300 bp to 1.5 kb. Case 1 achieves only marginal improvement in non-repetitive sequence coverage when compared with the same level of repeat nucleotide inclusion in the optimal tiling case. Non-repetitive sequence coverage in mammalian genomes falls sharply in Case 2 despite the inclusion of a high percentage of repetitive DNA. In each case, performance on mammalian genomes is significantly lower than that of the optimal tiling algorithm (Table 2).











