Table 2.

Data flow of the computational search for transcription-induced chimeras


Processa

Resulting data
1 Data download cDNA data from GenBank version 136
2 Alignment and clustering 26,057 clustersb of expressed sequences aligned to the genome
3 Sense/antisense separation 29,613 clusters on separate strands
4 Computational detection of gene fusion 322 pairs of fused genes
5 Filtering out alignment artifacts 281 pairs of fused genes
6 Manual filtration of artifacts
Final data set: 212 pairs of fused genes

a Procedures were performed as described in the Methods section

b A “cluster” is a group of ESTs that overlap on the genome and contains at least one RNA sequence