Large Gene Families inT. cruzi
| A Gene Families | No. of GSSs | Estimated No. of Copies | Relative % | No. of ESTs | Reference |
| dgf-1 | 494 | 154 | 4.3 | 6 | (Wincker et al. 1992) |
| trans-sialidase | 427 | 632 | 3.7 | 10 | (Parodi et al. 1992) |
| L1 non-LTR retrotransposon | 214 | 149 | 1.9 | 5 | (Martín et al. 1995) |
| Mucin | 122 | 710 | 1.1 | 5 | (Di Noia et al. 1995) |
| Cysteine proteinase (Cruzipain) | 39 | 91 | 0.3 | 7 | (Campetella et al. 1992a) |
| predicted ORF (gi3053534), chromosome 3 | 38 | 103 | 0.3 | 4 | (Andersson et al. 1998) |
| gp63 | 34 | 70 | 0.3 | 9 | AF110951, unpubl. |
| Histone H4 | 29 | 337 | 0.2 | 10 | (Soto et al. 1997) |
| Casein kinase homolog | 23 | 81 | 0.2 | 7 | AF089709, unpubl. |
| Adenylyl cyclase | 19 | 18 | 0.2 | 0 | (Taylor et al. 1999) |
| Hsp70 | 18 | 25 | 0.2 | 48 | (Requena et al. 1988) |
| Histone H2A | 17 | 145 | 0.1 | 106 | (Puerta et al. 1994) |
| Helicase | 14 | 24 | 0.1 | 15 | |
| Hsp90 | 11 | 18 | 0.1 | 7 | (Mottram et al. 1989) |
| Total | 1499 | 14.5 |
| B Repetitive DNA Families | No. of GSSs | Estimated No.of Copies | Relative % | No. of ESTs | Reference |
| minichromosomal 195 bp repeat | 854 | 15287 | 7.45 | ND | (Gonzalez et al. 1984) |
| TcIRE (I) | 266 | 1664 | 2.3 | 8 | this work |
| VIPER | 174 | 257 | 1.5 | ND | (Vázquez et al. 2000) |
| C6 interspersed element | 230 | 560 | 2.0 | ND | (Araya et al. 1997) |
| SIRE | 201 | 3011 | 1.8 | ND | (Vázquez et al. 2000) |
| telomere associated sequences | 131 | 1963 | 1.1 | ND | |
| TcIRE (II) | 47 | 2310 | 0.4 | 2 | this work |
| TRBSEQA | 31 | 105 | 0.3 | ND | (Requena et al. 1992) |
| HCR6 | 10 | 57 | 0.1 | ND | (de Mendonça-Lima and Traub-Cseko 1991) |
| Spliced Leader gene | 12 | 69 | 0.1 | ND | |
| Total | 2133 | 18.6 |
| C Unknown Families | No. of GSSs | Estimated No.of Copies | Relative % | No. of ESTs | Consensus Size (bp) |
| Cluster 2009 | 19 | 54 | 0.1 | 0 | 1220 |
| Cluster 2047 | 85 | 136 | 0.7 | 1 | 2170 |
| Cluster 2015 | 53 | 96 | 0.5 | 0 | 1917 |
| Cluster 1994 | 25 | 82 | 0.2 | 1 | 1056 |
| Cluster 2056 | 22 | 30 | 0.2 | 0 | 2571 |
| Cluster 2019 | 21 | 102 | 0.2 | 3 | 1051 |
| Cluster 2027 | 12 | 58 | 0.1 | 0 | 718 |
| Cluster 1986 | 10 | 48 | 0.1 | 0 | 728 |
| Total | 247 | 2.1 |
-
GSS sequences were clustered and their similarities against sequences in nonredundant databases were determined. Total number of GSS for each family was determined as described in the text. Note, however, that this approach can lead to a GSS belonging to more than one family. To calculate the number of copies, the value of the gene size (GS) used was the length of the coding sequence (excluding UTRs) of a representative member from each family; in the case of genes with different sizes, an average was used. When only partial sequences were available, copy numbers were not determined as this could lead to overestimation of the figures. To determine the number of ESTs for a given gene family, the consensus sequence or a sequence of a representative member was used to do a BLASTN search against the 8796T. cruzi ESTs. Matches with E < 1e-40 were considered positive. In the case of unpublished sequences that are available from nucleotide databases, the GenBank accession number is given. (A) Gene families (protein coding). (B) Repetitive DNA families (likely to be noncoding). (C) Uncharacterized sequences described in this work. Information about these unknown families (consensus sequence and individual GSSs included in the contig) can be found athttp://www.iib.unsam.edu.ar/genomelab/tcruzi/gss.html.











