Table 2.
The Number of Sequences Assigned to Each of Nineteen Categories Based on the Results of Our Automated Annotation Pipeline for the 60,770 FANTOM2 Sequences cDNA and the 33,409 Representative Sequences Determined to Be Unique Through Our Cluster Analysis
|
Category |
FANTOM2 60,770 all sequences |
FANTOM2 33,409 rep. sequences |
|---|---|---|
| 1. MGI assigned | 5,109 | 2,044 |
| 2. DNA hit (complete) | 4,911 | 2,354 |
| 3. DNA hit (partial) | 5,387 | 2,063 |
| 4. Protein hit (≥98% ID, 100% length, mouse) | 1,431 | 650 |
| 5. Protein hit (≥85% ID, ≥90% length, complete) | 3,017 | 1,351 |
| 6. Protein hit (≥85% ID, ≥90% length, partial) | 1,245 | 519 |
| 7. Protein hit (≥70% ID, ≥70% length, complete) | 822 | 409 |
| 8. Protein hit (≥70% ID, ≥70% length, partial) | 1,760 | 719 |
| 9. Protein hit (≥50% ID, ≥50% length, complete) | 342 | 153 |
| 10. Protein hit (≥50% ID, ≥50% length, partial) | 2,610 | 1,166 |
| 11. TIGR/UniGene clusters | 195 | 38 |
| 12. UniGene clusters | 522 | 147 |
| 13. TIGR clusters | 738 | 297 |
| 14. InterPro domain/motifs | 3,637 | 1,858 |
| 15. MDS domain/motifs | 3 | 2 |
| 16. SCOP domain/motifs | 788 | 351 |
| 17. hypothetical protein | 5,906 | 3,113 |
| 18. unknown EST | 14,139 | 8,689 |
|
19. unclassifiable
|
8,207
|
7,486
|











