ECgene: Genome-based EST clustering and gene modeling for alternative splicing

Table 1.

Summary of input sequences



Human

Mouse

Rat

RefSeq
mRNA
EST
RefSeq
mRNA
EST
RefSeq
mRNA
EST
Raw data from GenBank 25,975 133,271 5,426,061 40,568 113,526 3,918,650 21,937 11,779 538,134
No. of aligned sequences onto the genome after initial filtering 25,665 118,034 4,836,878 38,137 101,645 3,467,066 20,867 9996 487,771
No. of sequences after removal of bad alignmentsa 24,895 (96%) 112,933 (84%) 4,408,552 (81%) 37,268 (92%) 100,798 (89%) 3,348,841 (85%) 20,759 (94%) 9871 (84%) 471,043 (88%)
No. of spliced sequencesb
22,649 (91%)
86,897 (77%)
2,076,217 (47%)
29,948 (80%)
68,912 (68%)
1,315,511 (39%)
18,289 (88%)
8404 (85%)
169,604 (36%)
  • a Sequences included in the final clustering of ECgene (percentage of aligned sequences)

  • b Input sequences for transcript assembly procedure (percentage of multi-exon sequences out of all sequences in the ECgene)

This Article

  1. Genome Res. 15: 566-576

Preprint Server