Markup | Genome Research

Table 3.

Large Gene Families inT. cruzi

A Gene Families	No. of GSSs	Estimated No. of Copies	Relative %	No. of ESTs	Reference
dgf-1	494	154	4.3	6	(Wincker et al. 1992)
trans-sialidase	427	632	3.7	10	(Parodi et al. 1992)
L1 non-LTR retrotransposon	214	149	1.9	5	(Martín et al. 1995)
Mucin	122	710	1.1	5	(Di Noia et al. 1995)
Cysteine proteinase (Cruzipain)	39	91	0.3	7	(Campetella et al. 1992a)
predicted ORF (gi3053534), chromosome 3	38	103	0.3	4	(Andersson et al. 1998)
gp63	34	70	0.3	9	AF110951, unpubl.
Histone H4	29	337	0.2	10	(Soto et al. 1997)
Casein kinase homolog	23	81	0.2	7	AF089709, unpubl.
Adenylyl cyclase	19	18	0.2	0	(Taylor et al. 1999)
Hsp70	18	25	0.2	48	(Requena et al. 1988)
Histone H2A	17	145	0.1	106	(Puerta et al. 1994)
Helicase	14	24	0.1	15
Hsp90	11	18	0.1	7	(Mottram et al. 1989)
Total	1499		14.5

B Repetitive DNA Families	No. of GSSs	Estimated No.of Copies	Relative %	No. of ESTs	Reference
minichromosomal 195 bp repeat	854	15287	7.45	ND	(Gonzalez et al. 1984)
TcIRE (I)	266	1664	2.3	8	this work
VIPER	174	257	1.5	ND	(Vázquez et al. 2000)
C6 interspersed element	230	560	2.0	ND	(Araya et al. 1997)
SIRE	201	3011	1.8	ND	(Vázquez et al. 2000)
telomere associated sequences	131	1963	1.1	ND
TcIRE (II)	47	2310	0.4	2	this work
TRBSEQA	31	105	0.3	ND	(Requena et al. 1992)
HCR6	10	57	0.1	ND	(de Mendonça-Lima and Traub-Cseko 1991)
Spliced Leader gene	12	69	0.1	ND
Total	2133		18.6

C Unknown Families	No. of GSSs	Estimated No.of Copies	Relative %	No. of ESTs	Consensus Size (bp)
Cluster 2009	19	54	0.1	0	1220
Cluster 2047	85	136	0.7	1	2170
Cluster 2015	53	96	0.5	0	1917
Cluster 1994	25	82	0.2	1	1056
Cluster 2056	22	30	0.2	0	2571
Cluster 2019	21	102	0.2	3	1051
Cluster 2027	12	58	0.1	0	718
Cluster 1986	10	48	0.1	0	728
Total	247		2.1

[i] GSS sequences were clustered and their similarities against sequences in nonredundant databases were determined. Total number of GSS for each family was determined as described in the text. Note, however, that this approach can lead to a GSS belonging to more than one family. To calculate the number of copies, the value of the gene size (GS) used was the length of the coding sequence (excluding UTRs) of a representative member from each family; in the case of genes with different sizes, an average was used. When only partial sequences were available, copy numbers were not determined as this could lead to overestimation of the figures. To determine the number of ESTs for a given gene family, the consensus sequence or a sequence of a representative member was used to do a BLASTN search against the 8796T. cruzi ESTs. Matches with E < 1e-40 were considered positive. In the case of unpublished sequences that are available from nucleotide databases, the GenBank accession number is given. (A) Gene families (protein coding). (B) Repetitive DNA families (likely to be noncoding). (C) Uncharacterized sequences described in this work. Information about these unknown families (consensus sequence and individual GSSs included in the contig) can be found athttp://www.iib.unsam.edu.ar/genomelab/tcruzi/gss.html.