Table 3.

Large Gene Families inT. cruzi

A Gene Families No. of GSSs Estimated No. of Copies Relative % No. of ESTs Reference
dgf-14941544.36(Wincker et al. 1992)
trans-sialidase4276323.710(Parodi et al. 1992)
L1 non-LTR retrotransposon2141491.95(Martín et al. 1995)
Mucin1227101.15(Di Noia et al. 1995)
Cysteine proteinase (Cruzipain)39910.37(Campetella et al. 1992a)
predicted ORF (gi3053534), chromosome 3381030.34(Andersson et al. 1998)
gp6334700.39 AF110951, unpubl.
Histone H4293370.210(Soto et al. 1997)
Casein kinase homolog23810.27 AF089709, unpubl.
Adenylyl cyclase19180.20(Taylor et al. 1999)
Hsp7018250.248(Requena et al. 1988)
Histone H2A171450.1106(Puerta et al. 1994)
Helicase14240.115
Hsp9011180.17(Mottram et al. 1989)
Total149914.5
B Repetitive DNA Families No. of GSSs Estimated No.of Copies Relative % No. of ESTs Reference
minichromosomal 195 bp repeat854152877.45ND(Gonzalez et al. 1984)
TcIRE (I)26616642.38this work
VIPER1742571.5ND(Vázquez et al. 2000)
C6 interspersed element2305602.0ND(Araya et al. 1997)
SIRE20130111.8ND(Vázquez et al. 2000)
telomere associated sequences13119631.1ND
TcIRE (II)4723100.42this work
TRBSEQA311050.3ND(Requena et al. 1992)
HCR610570.1ND(de Mendonça-Lima and Traub-Cseko 1991)
Spliced Leader gene12690.1ND
Total213318.6
C Unknown Families No. of GSSs Estimated No.of Copies Relative % No. of ESTs Consensus Size (bp)
Cluster 200919540.101220
Cluster 2047851360.712170
Cluster 201553960.501917
Cluster 199425820.211056
Cluster 205622300.202571
Cluster 2019211020.231051
Cluster 202712580.10718
Cluster 198610480.10728
Total2472.1

[i] GSS sequences were clustered and their similarities against sequences in nonredundant databases were determined. Total number of GSS for each family was determined as described in the text. Note, however, that this approach can lead to a GSS belonging to more than one family. To calculate the number of copies, the value of the gene size (GS) used was the length of the coding sequence (excluding UTRs) of a representative member from each family; in the case of genes with different sizes, an average was used. When only partial sequences were available, copy numbers were not determined as this could lead to overestimation of the figures. To determine the number of ESTs for a given gene family, the consensus sequence or a sequence of a representative member was used to do a BLASTN search against the 8796T. cruzi ESTs. Matches with E < 1e-40 were considered positive. In the case of unpublished sequences that are available from nucleotide databases, the GenBank accession number is given. (A) Gene families (protein coding). (B) Repetitive DNA families (likely to be noncoding). (C) Uncharacterized sequences described in this work. Information about these unknown families (consensus sequence and individual GSSs included in the contig) can be found athttp://www.iib.unsam.edu.ar/genomelab/tcruzi/gss.html.