Results of Two Cross-Validation Schemes for the Log-Odds Score Obtained From 5-Symbol (Match of A or T, Match ofG or C, Transition, Transversion, Gap) Fifth Order Markov Models
| Cross-validation scheme | Reg.correct | Reg.ambiguous | Reg.erroneous | Anc. rep.correct | Anc. rep.ambiguous | Anc. rep.erroneous |
| 5–5 (100) | 81.4 | 12.2 | 6.4 | 73 | 21 | 6 |
| Leave-one-out | 78.49 | 15.05 | 6.452 | 72.5 | 21 | 6.5 |
-
Because the regulatory and ancestral repeats distributions do not overlap, instead of reporting correct and erroneous classification rates relative to an arbitrary threshold, we list three percentages; namely, correct classifications, ambiguous cases (falling between the two nonoverlapping distributions), and erroneous classifications. In the first cross-validation scheme, we withhold from training and then classify five regulatory elements and five ancestral repeat segments selected at random. This procedure is repeated 100 independent times, and correct, ambiguous, and erroneous classification percentages are obtained averaging over these replications. The second scheme is a leave-one-out cross-validation, in which each data point in turn is withheld from training and then classified.











