Development and Evaluation of an Automated Annotation Pipeline and cDNA Annotation System

Table 2.

The Number of Sequences Assigned to Each of Nineteen Categories Based on the Results of Our Automated Annotation Pipeline for the 60,770 FANTOM2 Sequences cDNA and the 33,409 Representative Sequences Determined to Be Unique Through Our Cluster Analysis


Category

FANTOM2 60,770 all sequences

FANTOM2 33,409 rep. sequences
1. MGI assigned 5,109 2,044
2. DNA hit (complete) 4,911 2,354
3. DNA hit (partial) 5,387 2,063
4. Protein hit (≥98% ID, 100% length, mouse) 1,431 650
5. Protein hit (≥85% ID, ≥90% length, complete) 3,017 1,351
6. Protein hit (≥85% ID, ≥90% length, partial) 1,245 519
7. Protein hit (≥70% ID, ≥70% length, complete) 822 409
8. Protein hit (≥70% ID, ≥70% length, partial) 1,760 719
9. Protein hit (≥50% ID, ≥50% length, complete) 342 153
10. Protein hit (≥50% ID, ≥50% length, partial) 2,610 1,166
11. TIGR/UniGene clusters 195 38
12. UniGene clusters 522 147
13. TIGR clusters 738 297
14. InterPro domain/motifs 3,637 1,858
15. MDS domain/motifs 3 2
16. SCOP domain/motifs 788 351
17. hypothetical protein 5,906 3,113
18. unknown EST 14,139 8,689
19. unclassifiable
8,207
7,486

This Article

  1. Genome Res. 13: 1542-1551

Preprint Server