A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

Table 5.

Error Analysis

Tissue Single consensus clusters Total clusters (%) Multi-consensus clusters Total clusters (%) Total only singletons Total clusters (%) Single consensus + 1 or more singletons Total clusters (%) 3′/5′ disagreement Total clusters
Adipose 173 96 5 3 3 2 0 0 23 13
Brain 19,933 87 1,850 8 296 1 769 4 2,552 11
Cochlea 689 97 13 2 4 0.5 4 0.5 18 3
Connective 4,098 88 316 7 93 2 140 3 358 8
Digestive 6,089 90 370 6 82 1 193 3 493 7
Disease 10,845 87 989 8 198 1 481 4 2,589 21
Eye 2,799 81 288 8 229 7 132 4 303 9
Genomic 14,924 91 792 5 177 1 421 3 2,550 16
Gland 10,843 88 820 7 237 0.2 408 4.8 1,096 9
Heart 7,341 88 622 7 104 1 274 4 699 8
Hematolymph 14,639 84 1,774 10 271 2 694 4 2,731 16
Lung 7,483 87 667 8 137 2 267 3 1,828 21
Muscle 1,084 92 64 5 12 1 23 2 67 6
Olfactory 238 96 7 2.8 2 0.8 1 0.4 4 2
Other 2,675 85 285 7 184 4 171 4 172 4
Reproductive 19,178 79 3,196 13 533 2 1,258 6 3,373 14
 Totals 124,031 86 12,058 8 2,562 2 5,236 4 18,856 13
  • CRAW analyzes cluster alignments generated by PHRAP or MSA_CONTIG and partitions consistent ESTs into subclusters based on agreement with other sequences. The ideal result is a single consensus cluster, accounting for 86% of the STACK output, while the remaining clusters may contain multiple sequence subclusters (resulting in a multiconsensus cluster), a primary consensus with one or more singleton sequences (data not shown), singleton ESTs according to the CRAW parameters. STACK clusters are generated by word identity counts and their read direction determined by majority vote of the annotations of constituent ESTs; clusters for which this vote is not unanimous (excluding abstentions) are noted in the right-most two columns.

This Article

  1. Genome Res. 9: 1143-1155

Preprint Server