Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs

Table 1.

Summary of Results From EST Assembly of Sponge, Dog, and Grapevine Sequences




Sponge

Dog

Grapevine
Input sequences 9747 10,863 32,776
Strains/cell types 1 1 10
Step 1: transcript SNP separation assembly
Total transcripts 4401 5921 12,380
    thereof singlets 3151 4204 7904
    thereof contigs 1250 1717 4476
        Max cov/occurred 145/1 106/1 812/1
        Min cov/occurred 2/637 2/885 2/2143
Total transcript len. 3,342,596 3,941,124 7,082,719
Step 3: transcript SNP classification assembly
Total unified transcr. 4077 5901 8547
    thereof singlets 3780 5811 6131
    thereof contigs 297 90 2416
    thereof with SNPs 285 81 2103
Total transcript len. 3,120,847 3,897,635 4,872,333
Transcript SNP types
    Intra strain/cell 2158 461 959
    Inter strain/cell 1505
    Intra and Inter s./c. 7221
Total SNP sites
4653
927
9685
  • Step 1: result sequences are transcripts separated by SNPs, but not by strain. The number of contigs, the classification numbers on maximum and minimum coverage (and the times they occurred) within the contigs as well as the number of singlets, give a rough idea about the asymmetrical distributions of EST reads in the different contigs.

    Step 3: `assembly of pristine mRNA transcripts' to analyze SNP sites and types. The transcripts' sequences gained there can be seen as a consensus of the (hopefully) pristine transcripts gained in the previous steps of the assembly. Classification of SNPs (see also the subsection of the same name in the Methods section) is also performed in this step: Intra means that SNPs occur only with a strain or cell type, SNPs of type Inter occur only when comparing different strains or cell types, and the Intra and Inter SNP type is a combination of the first two types. Intermediary results from step 2 are not shown, as sponge and dog do not use this step, and the grapevine results are too extensive.

This Article

  1. Genome Res. 14: 1147-1159

Preprint Server