d2_cluster: A Validated Method for Clustering EST and Full-Length cDNA Sequences

Initial state: Each sequence is in its own cluster. (i.e., Si is in cluster i or Ci = i).
First iteration I1: The first sequence in the database, S0, is selected as a query. For each sequence in Si(1 ≤ i < N), MERGE(cluster C0) ← (cluster Ci) if d2(S0,Si) < THRESHHOLD.
Second iteration I2: The second sequence in the database (S1) is now selected as a query. Note that C1 = 1 unless sequence 1 was merged into cluster 0 during step I1. For all sequences, Si (2 ≤ i <N), MERGE(cluster C1 ← clusterCi) if d2(S1,Si) < THRESHHOLD.
(k)th iteration I(k): Suppose we have completed (k − 1) iterations. We select sequence Sk as a query. For all seqs,Si (k + 1 ≤ i < N),MERGE(cluster Ck ← cluster Ci) if d2(Sk,Si) < THRESHHOLD.

This Article

  1. Genome Res. 9: 1135-1142

Preprint Server