Gene and alternative splicing annotation with AIR

Table 3.

Accuracy evaluation for rat AIR predictions: splice junction (A) and coverage statistics (B), and selectivity and evidence retention (C)


A












Transcript set
Transcripts
Spliced transcripts
≥1 irregular splice jcts.
≥2 irregular splice jcts.
Splice junctions
Irregular splice jcts.
No selection 126,440 110,425 24,756 (22.42%) 9,182 (8.32%) 1,629,754 36,096 (2.21%)
Score selection 70,047 54,032 5,373 (9.94%) 565 (1.05%) 354,936 6,047 (1.70%)
Final 60,683 45,348 4,935 (10.88%) 539 (1.19%) 339,302 5,579 (1.64%)
RGD, V = 12 3,642 2,452 837 (24.25%) 249 (7.21%) 33,028 1.228 (3.72%)
RGD, V = 30
3,642
3,444
788 (22.88%)
226 (6.56%)
32,880
1,124 (3.42%)

B














Transcript set
Transcripts
≥50% coverage
≥80% coverage
≥90% coverage
≥95% coverage
≥98% coverage
100% coverage
No selection 126,440 98,005 (77.5%) 74,148 (59.6%) 68,352 (54.1%) 65,043 (51.4%) 62,431 (49.4%) 59,670 (47.2%)
Score selection 70,047 67,485 (96.3%) 64,339 (91.9%) 62,554 (89.3%) 61,102 (87.2%) 59,710 (85.2%) 57,820 (82.5%)
Final
60,683
59,572 (98.2%)
57,606 (94.9%)
56,293 (92.8%)
55,153 (90.9%)
53,983 (89.0%)
52,468 (86.5%)

C













AIR genes
AIR transcripts
Rat RefSeq + GBmRNA
Lost
Mouse RefSeq + GBmRNA
Lost
Total N/A N/A 15,019 N/A 125,972 N/A
Post mapping/tracking N/A N/A 14,208 811 101,036 24,936
Post alignment filtering 45,040 126,440 13,890 318 100,673 611
Post score-based selection 45,040 70,047 13,663 227 98,802 1871
Final
38,598
60,683
13,663
N/A
98,802
N/A
  • An irregular splice junction is one other than GT–AG, GC–AG, and AT–AC. For this analysis, results from evaluating the set of RGD genes were included for comparison. Spurious introns less than V bp long were eliminated and their adjacent exons merged in the RGD genes.

  • Coverage is measured as the number of transcript bases contained in some evidence alignment.

  • Retention rates for mRNA evidence in the resulting AIR predictions at various stages of transcript selection are shown. AIR selects and retains essential evidence—13,663 of the 13,890 rat mRNA sequences (98.4%) and 98,802 of the 100,673 mouse mRNA transcripts (98.1%), whereas efficiently filtering unlikely candidate transcripts—65,757 of the 126,440 combinations encoded in the splice graphs are eliminated. N/A = not applicable.

This Article

  1. Genome Res. 15: 54-66

Preprint Server