Accuracy evaluation for rat AIR predictions: splice junction (A) and coverage statistics (B), and selectivity and evidence retention (C)
|
A |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Transcript set
|
Transcripts
|
Spliced transcripts
|
≥1 irregular splice jcts.
|
≥2 irregular splice jcts.
|
Splice junctions
|
Irregular splice jcts.
|
| No selection | 126,440 | 110,425 | 24,756 (22.42%) | 9,182 (8.32%) | 1,629,754 | 36,096 (2.21%) |
| Score selection | 70,047 | 54,032 | 5,373 (9.94%) | 565 (1.05%) | 354,936 | 6,047 (1.70%) |
| Final | 60,683 | 45,348 | 4,935 (10.88%) | 539 (1.19%) | 339,302 | 5,579 (1.64%) |
| RGD, V = 12 | 3,642 | 2,452 | 837 (24.25%) | 249 (7.21%) | 33,028 | 1.228 (3.72%) |
| RGD, V = 30
|
3,642
|
3,444
|
788 (22.88%)
|
226 (6.56%)
|
32,880
|
1,124 (3.42%)
|
|
B |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Transcript set
|
Transcripts
|
≥50% coverage
|
≥80% coverage
|
≥90% coverage
|
≥95% coverage
|
≥98% coverage
|
100% coverage
|
| No selection | 126,440 | 98,005 (77.5%) | 74,148 (59.6%) | 68,352 (54.1%) | 65,043 (51.4%) | 62,431 (49.4%) | 59,670 (47.2%) |
| Score selection | 70,047 | 67,485 (96.3%) | 64,339 (91.9%) | 62,554 (89.3%) | 61,102 (87.2%) | 59,710 (85.2%) | 57,820 (82.5%) |
| Final
|
60,683
|
59,572 (98.2%)
|
57,606 (94.9%)
|
56,293 (92.8%)
|
55,153 (90.9%)
|
53,983 (89.0%)
|
52,468 (86.5%)
|
|
C |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
AIR genes
|
AIR transcripts
|
Rat RefSeq + GBmRNA
|
Lost
|
Mouse RefSeq + GBmRNA
|
Lost
|
| Total | N/A | N/A | 15,019 | N/A | 125,972 | N/A |
| Post mapping/tracking | N/A | N/A | 14,208 | 811 | 101,036 | 24,936 |
| Post alignment filtering | 45,040 | 126,440 | 13,890 | 318 | 100,673 | 611 |
| Post score-based selection | 45,040 | 70,047 | 13,663 | 227 | 98,802 | 1871 |
| Final
|
38,598
|
60,683
|
13,663
|
N/A
|
98,802
|
N/A
|
-
An irregular splice junction is one other than GT–AG, GC–AG, and AT–AC. For this analysis, results from evaluating the set of RGD genes were included for comparison. Spurious introns less than V bp long were eliminated and their adjacent exons merged in the RGD genes.
-
Coverage is measured as the number of transcript bases contained in some evidence alignment.
-
Retention rates for mRNA evidence in the resulting AIR predictions at various stages of transcript selection are shown. AIR selects and retains essential evidence—13,663 of the 13,890 rat mRNA sequences (98.4%) and 98,802 of the 100,673 mouse mRNA transcripts (98.1%), whereas efficiently filtering unlikely candidate transcripts—65,757 of the 126,440 combinations encoded in the splice graphs are eliminated. N/A = not applicable.











