Gene and alternative splicing annotation with AIR

Table 4.

Structural comparison between the AIR predicted transcripts and the 237 RGD curated RefSeq genes on rat chromosome 4


(A) Transcript structure comparison




Categories
Cases
Comments
Compatible exon structures
Equivalent 35 Same exon-intron structure
Extended 153 Structure extended at 5′ or 3′ end in AIR transcripts
Containing 5 Contains an AIR transcript as a substructure. AIR Match is short or misses marginal exons.
Strong structural differences
Incomplete RGD gene mapping 27 (+27/-0) Missing single exons, parts of exons or groups of exons.
Incomplete AIR mapping 3 (+0/-3) Missing single (short) exons. Missing the alignment for the gene.
Incomplete incompatible RGD and AIR mappings 3 (+3/-3) Different missing exons produced by the two methods.
Different AIR and RGD mappings 3 (+2/-1) Chimeras and paralogs. Multiple RGD mappings for a gene; only best mapping selected by AIR.
AIR transcript building errors 2 (+0/-2) Evidence bias in splice junction detection. Extension into a cDNA gap.
No match
3 (+3/-0)
Paralog; best match located elsewhere. Spurious RGD mapping, not confirmed.

(B) Splice junction comparison










Window (W)
Introns
Exact
Partial
Weak
No match
W = 0 2118 2064 27 13 14
W = 5 2118 2090 13 1 14
W = 10
2118
2094
10
0
14

(C) ORF comparison




Categories
Cases
Comments
Compatible ORFs
Identity 188 Identical ORFs and protein products. AIR transcript may be equal to, extend, complete or be contained in the RGD gene.
Near-identity 5 Minor (1-2 aa) differences in ORFs/proteins. AIR transcript misses short exon, or RGD and/or AIR choose alternative splice junctions.
Extension 10 RGD ORF/protein extended at 5′ (N-terminal) or 3′ (C-terminal) end in AIR transcripts. AIR transcript extends or completes RGD gene.
Completion 2 AIR ORF/protein fills-in internal gaps in (and possibly extends) RGD ORF/protein. RGD transcript missed (internal) exons.
Truncation 2 AIR protein is a portion of the RGD protein. AIR transcript was contained in the RGD gene.
Different ORFs
Partial match 4 (-4) Frameshifts in AIR, caused by inaccurate splice junctions and/or exon ends.
13 (+13) Frameshifts in RGD, caused by missing exons, and inaccurate splice junctions and/or exon ends.
4 (×4) Different RGD and AIR ends for exons flanking alignment gaps caused by gaps in the genome
No match
2 (×2)
Non-overlapping ORFs likely caused by frameshifts in both RGD and AIR; different exon ends flanking alignment gaps.
  • Numbers in parentheses indicate the number of cases favoring (+) or disfavoring (-) AIR against RGD genes.

  • Introns are “exact” if both exon ends agree between the AIR and RGD annotations, within a W-bp window; “partial” if only one exon end agrees; and “weak” if the introns overlap strictly.

  • For each AIR and RGD sequence, the ORF is determined as the longest in-frame DNA stretch between a start (ATG) and a stop (TAA, TAG, or TGA) codon or, if no stop codon is encountered, the end of the sequence. (+) The AIR ORF is believed to be correct; (-) the AIR ORF is believed incorrect, but the RGD ORF is deemed correct; (×) either an ambiguous case, or both ORFs are likely to be erroneous.

This Article

  1. Genome Res. 15: 54-66

Preprint Server