A Random Sequencing Approach for the Analysis of the Trypanosoma cruzi Genome: General Structure, Large Gene and Repetitive DNA Families, and Gene Discovery

Table 3.

Large Gene Families inT. cruzi

A Gene Families No. of GSSs Estimated No. of Copies Relative % No. of ESTs Reference
dgf-1 494 154 4.3 6 (Wincker et al. 1992)
trans-sialidase 427 632 3.7 10 (Parodi et al. 1992)
L1 non-LTR retrotransposon 214 149 1.9 5 (Martín et al. 1995)
Mucin 122 710 1.1 5 (Di Noia et al. 1995)
Cysteine proteinase (Cruzipain) 39 91 0.3 7 (Campetella et al. 1992a)
predicted ORF (gi3053534), chromosome 3 38 103 0.3 4 (Andersson et al. 1998)
gp63 34 70 0.3 9 AF110951, unpubl.
Histone H4 29 337 0.2 10 (Soto et al. 1997)
Casein kinase homolog 23 81 0.2 7 AF089709, unpubl.
Adenylyl cyclase 19 18 0.2 0 (Taylor et al. 1999)
Hsp70 18 25 0.2 48 (Requena et al. 1988)
Histone H2A 17 145 0.1 106 (Puerta et al. 1994)
Helicase 14 24 0.1 15
Hsp90 11 18 0.1 7 (Mottram et al. 1989)
Total 1499 14.5
B Repetitive DNA Families No. of GSSs Estimated No.of Copies Relative % No. of ESTs Reference
minichromosomal 195 bp repeat 854 15287 7.45 ND (Gonzalez et al. 1984)
TcIRE (I) 266 1664 2.3 8 this work
VIPER 174 257 1.5 ND (Vázquez et al. 2000)
C6 interspersed element 230 560 2.0 ND (Araya et al. 1997)
SIRE 201 3011 1.8 ND (Vázquez et al. 2000)
telomere associated sequences 131 1963 1.1 ND
TcIRE (II) 47 2310 0.4 2 this work
TRBSEQA 31 105 0.3 ND (Requena et al. 1992)
HCR6 10 57 0.1 ND (de Mendonça-Lima and Traub-Cseko 1991)
Spliced Leader gene 12 69 0.1 ND
Total 2133 18.6
C Unknown Families No. of GSSs Estimated No.of Copies Relative % No. of ESTs Consensus Size (bp)
Cluster 2009 19 54 0.1 0 1220
Cluster 2047 85 136 0.7 1 2170
Cluster 2015 53 96 0.5 0 1917
Cluster 1994 25 82 0.2 1 1056
Cluster 2056 22 30 0.2 0 2571
Cluster 2019 21 102 0.2 3 1051
Cluster 2027 12 58 0.1 0 718
Cluster 1986 10 48 0.1 0 728
Total 247 2.1
  • GSS sequences were clustered and their similarities against sequences in nonredundant databases were determined. Total number of GSS for each family was determined as described in the text. Note, however, that this approach can lead to a GSS belonging to more than one family. To calculate the number of copies, the value of the gene size (GS) used was the length of the coding sequence (excluding UTRs) of a representative member from each family; in the case of genes with different sizes, an average was used. When only partial sequences were available, copy numbers were not determined as this could lead to overestimation of the figures. To determine the number of ESTs for a given gene family, the consensus sequence or a sequence of a representative member was used to do a BLASTN search against the 8796T. cruzi ESTs. Matches with E < 1e-40 were considered positive. In the case of unpublished sequences that are available from nucleotide databases, the GenBank accession number is given. (A) Gene families (protein coding). (B) Repetitive DNA families (likely to be noncoding). (C) Uncharacterized sequences described in this work. Information about these unknown families (consensus sequence and individual GSSs included in the contig) can be found athttp://www.iib.unsam.edu.ar/genomelab/tcruzi/gss.html.

This Article

  1. Genome Res. 10: 1996-2005

Preprint Server