A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

Table 2.

STACK Clustering and Alignment Results

Tissue Singletons Total sequences (%) Multisequence clusters Sequence in MSC Percent total sequences (clustering efficiency) Small sequences Total sequences (%) Total sequences
Adipose 1,693 71 181 572 24 111 5 2,376
Brain 42,245 24 22,848 130,573 73 4,458 3 177,719
Cochlea 1,973 46 710 2,213 51 118 3 4,304
Connective 12,652 31 4,646 26,210 64 876 2 40,753
Digestive 17,398 34 6,734 32,124 63 1,481 3 51,032
Disease 29,139 25 12,513 79,433 69 4,056 4 114,496
Eye 13,867 49 3,448 12,933 45 1,388 5 28,514
Genomic 38,481 38 16,314 72,066 71 4,457 4 101,986
Gland 25,836 23 12,307 62,176 55 1,672 1 112,346
Heart 20,782 30 8,341 45,795 66 217 0.3 69,830
Hematolymph 51,654 20 17,378 113,147 44 2,582 1 255,565
Lung 20,129 29 8,554 47,151 67 2,726 4 70,259
Muscle 4,534 28 1,183 8,792 54 1,037 6 16,237
Olfactory 1,478 56 248 830 32 283 11 2,600
Other 9,392 36 4,315 15,663 60 575 2 25,925
Reproductive 43,569 18 24,165 188,088 79 6,321 3 239,161
 Totals 334,822 26 143,885 837,766 64 32,240 2 1,198,607
  • The total sequences in each tissue set are partitioned by D2_CLUSTER into unique sequences (singletons) and clusters containing multiple related sequences [multi-seq clusters, (MSC)], whereas sequences of <50 bases are excluded from clustering (small sequences).

This Article

  1. Genome Res. 9: 1143-1155

Preprint Server