Connecting Sequence and Biology in the Laboratory Mouse

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2
Figure 2

Schematic showing status and number code assignments for clusters and their use in cluster curation. Hypothetical outputs of the three cluster builds for six FANTOM2 clones are shown. Status and number codes for each clone, as well as cluster union IDs, appear in the table to the right. FANTOM2 clones and non-RIKEN public sequences are shown as solid and open boxes, respectively. Sequences associated with MGI genes are distinguished by block arrows. In the top set of clusters, RIKEN clones 1 and 2 were grouped with the same EST sequences in NCBI UniGene and TIGR clusters, and were assigned the same cluster union ID (100). The RIKEN status code (-NT) for these clones indicates that NCBI Unigene and TIGR clusters are the same for these clones, but that RIKEN clusters A and B are different. The RIKEN number code (1,2,2) indicates that one, two, and two total RIKEN clones were clustered with those clones (including themselves) in the three respective cluster builds, irrespective of clone identities. The MGI status code (-NT) indicates that only the UniGene and TIGR clusters grouped sequences associated with the same number and identity of MGI genes (only MGI gene Shh is represented via EST 1234). The MGI number code (0,1,1) indicates that a single MGI gene is represented in the UniGene and TIGR clusters (irrespective of gene identities) and none in the RIKEN clusters containing these clones. The bottom set shows the clusters containing four additional RIKEN clones (3 through 6). Clones 3, 4, and 5 are grouped in UniGene cluster Mm.12; yet, clones 3, 4, and 6 are grouped in the TIGR cluster TC:22, thus all four clones are assigned to the same cluster union (200). The variability in RIKEN status codes indicates that each clone was grouped differently from the others by the three builds. The MGI number code (0,2,1) for three of the clones (3, 4, and 5) indicates that the UniGene cluster containing them (Mm.12) has grouped sequences associated with two MGI genes (Fgf4 and Fgf5), whereas only one MGI gene is represented in each of the TIGR clusters that contain them (TC:22, and TC:23). The MGI status code (---) for each clone indicates that no two clusters containing them represent exactly the same set of MGI genes. For this example, curators would determine if the sequence associations in UniGene cluster Mm.12 are biologically appropriate; if so, then the MGI gene records involved may need to be merged into a single record.

This Article

  1. Genome Res. 13: 1505-1519

Preprint Server