Markup | Genome Research

Table 1.

Sequence Sets Used in Analyses

Data set	No. of seqs.	No. of chars.	Description
Fungal nucleotide data sets
NcrEST	3,578	1,821,906	N. crassa ESTs from the Neurospora Genome Project.[i]
Ncr contigs	2,093	1,147,268	N. crassasequences assembled from “Ncr EST.”[ii]
ScerEST	3,424	1,136,588	S. cerevisiae ESTs from TIGR.[iii]
CAL	1,631	14,929,251	genomic sequence from C. albicans. [iv]
ENI	13,404	5,594,817	nucleotide sequences from A. nidulans.[v]
Fungal amino acid data sets
NCR	1,007	400,653	translated ORFs for non-ESTN. crassa sequences.[v]
SC	6,227	2,908,935	translated ORFs from completeS. cerevisiae genome.[vi]
NAscF	2,130	735,449	translated ORFs from nonascomycete fungi.[v]
Spo	8,358	3,708,009	translated ORFs from S. pombe. [v]
Nonfungal nucleotide data sets
HMEST	1,228,825	455,623,980	human and mouse ESTs from dbEST.[vii]
Nonfungal amino acid data sets
NF	206,898	64,637,987	translated ORFs for nonfungal organisms.[viii]
EUTH	166,241	44,409,356	translated ORFs from eutherian (placental) mammals.[v]

[i] http://www.unm.edu/∼ngp/.

[ii] These assembled sequences were clustered into 1197 discontigs, which correspond to putative unique loci. These sequences can be retrieved fromhttp://molbio.ahpcc.unm.edu/search/discontigs.html using the discontig numbers used in this paper.

[iii] ftp://ftp.tigr.org/pub/data/estdb/yestfal.Z.

[iv] http://www-sequence.stanford.edu/group/candida/.

[v] http://www3.ncbi.nlm.nih.gov/Entrez/batch.html.

[vi] http://genome-www.stanford.edu/Saccharomyces.

[vii] ftp://ncbi.nlm.nih.gov/blast/db.

[viii] Subset of GSDB (Skupski et al. 1999) kindly provided by Marian Skupski of the National Center for Genome Resources.