Structural comparison between the AIR predicted transcripts and the 237 RGD curated RefSeq genes on rat chromosome 4
|
(A) Transcript structure comparison |
|
|
||
|---|---|---|---|---|
| Categories
|
Cases
|
Comments
|
||
| Compatible exon structures | ||||
| Equivalent | 35 | Same exon-intron structure | ||
| Extended | 153 | Structure extended at 5′ or 3′ end in AIR transcripts | ||
| Containing | 5 | Contains an AIR transcript as a substructure. AIR Match is short or misses marginal exons. | ||
| Strong structural differences | ||||
| Incomplete RGD gene mapping | 27 (+27/-0) | Missing single exons, parts of exons or groups of exons. | ||
| Incomplete AIR mapping | 3 (+0/-3) | Missing single (short) exons. Missing the alignment for the gene. | ||
| Incomplete incompatible RGD and AIR mappings | 3 (+3/-3) | Different missing exons produced by the two methods. | ||
| Different AIR and RGD mappings | 3 (+2/-1) | Chimeras and paralogs. Multiple RGD mappings for a gene; only best mapping selected by AIR. | ||
| AIR transcript building errors | 2 (+0/-2) | Evidence bias in splice junction detection. Extension into a cDNA gap. | ||
| No match
|
3 (+3/-0)
|
Paralog; best match located elsewhere. Spurious RGD mapping, not confirmed.
|
||
|
(B) Splice junction comparison |
|
|
|
|
|
|---|---|---|---|---|---|
| Window (W)
|
Introns
|
Exact
|
Partial
|
Weak
|
No match
|
| W = 0 | 2118 | 2064 | 27 | 13 | 14 |
| W = 5 | 2118 | 2090 | 13 | 1 | 14 |
| W = 10
|
2118
|
2094
|
10
|
0
|
14
|
|
(C) ORF comparison |
|
|
||
|---|---|---|---|---|
| Categories
|
Cases
|
Comments
|
||
| Compatible ORFs | ||||
| Identity | 188 | Identical ORFs and protein products. AIR transcript may be equal to, extend, complete or be contained in the RGD gene. | ||
| Near-identity | 5 | Minor (1-2 aa) differences in ORFs/proteins. AIR transcript misses short exon, or RGD and/or AIR choose alternative splice junctions. | ||
| Extension | 10 | RGD ORF/protein extended at 5′ (N-terminal) or 3′ (C-terminal) end in AIR transcripts. AIR transcript extends or completes RGD gene. | ||
| Completion | 2 | AIR ORF/protein fills-in internal gaps in (and possibly extends) RGD ORF/protein. RGD transcript missed (internal) exons. | ||
| Truncation | 2 | AIR protein is a portion of the RGD protein. AIR transcript was contained in the RGD gene. | ||
| Different ORFs | ||||
| Partial match | 4 (-4) | Frameshifts in AIR, caused by inaccurate splice junctions and/or exon ends. | ||
| 13 (+13) | Frameshifts in RGD, caused by missing exons, and inaccurate splice junctions and/or exon ends. | |||
| 4 (×4) | Different RGD and AIR ends for exons flanking alignment gaps caused by gaps in the genome | |||
| No match
|
2 (×2)
|
Non-overlapping ORFs likely caused by frameshifts in both RGD and AIR; different exon ends flanking alignment gaps.
|
||
-
Numbers in parentheses indicate the number of cases favoring (+) or disfavoring (-) AIR against RGD genes.
-
Introns are “exact” if both exon ends agree between the AIR and RGD annotations, within a W-bp window; “partial” if only one exon end agrees; and “weak” if the introns overlap strictly.
-
For each AIR and RGD sequence, the ORF is determined as the longest in-frame DNA stretch between a start (ATG) and a stop (TAA, TAG, or TGA) codon or, if no stop codon is encountered, the end of the sequence. (+) The AIR ORF is believed to be correct; (-) the AIR ORF is believed incorrect, but the RGD ORF is deemed correct; (×) either an ambiguous case, or both ORFs are likely to be erroneous.











