Table 1.

Annotation Statistics and Sources for Fields in the GALA Database

Category Entries Source Example fields from this category Reference URL
Genes35,535LocusLink at NCBIName, type, orientation, exons, coding Pruitt and Maglott 2001 http://www.ncbi.nlm.nih.gov/LocusLink/
Genes865RefSeq at NCBI and HGBName, type, orientation, exons, coding Pruitt and Maglott 2001 http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://genome.ucsc.edu/
Gene products and function17,388LocusLink at NCBIProduct, biological process, cellular component, molecular function, conserved domain Pruitt and Maglott 2001 http://www.ncbi.nlm.nih.gov/LocusLink/
Expression data602,388UniGene at NCBITissue Wheeler et al. 2002 http://www.ncbi.nlm.nih.gov/
Genetic disorders2,802OMIMDisorder Hamosh et al. 2002 http://www.ncbi.nlm.nih.gov/omim/
Alternate gene model: Acembly genes123,238Acembly and HGBName, type, orientation, exons, codingJ. Thierry-Mieg et al., unpublished http://www.acedb.org/Cornell/acembly/http://genome.ucsc.edu/
Alternate gene model: Ensembl genes27,561Ensembl and HGBName, type, orientation, exons, coding Hubbard et al. 2002 http://www.ensembl.org/http://genome.ucsc.edu/
Alternate gene model: Genscan genes42,737Genscan and HGBName, type, orientation, exons, coding Burge and Karlin 1997 http://genes.mit.edu/GENSCAN.html http://genome.ucsc.edu/
Alternate gene model: RefSeq genes16,222RefSeq and HGBName, type, orientation, exons, coding Pruitt and Maglott 2001 http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://genome.ucsc.edu/
Alternate gene model: Twinscan genes25,744Twinscan and HGBName, type, orientation, exons, coding Korf et al. 2001 http://genes.cs.wustl.edu/http://genome.ucsc.edu/
Local alignments1,585,186MGSC and HGBLength, percent identity, gap size, identity step Waterston et al. 2002; Schwartz et al. 2003 http://bio.cse.psu.edu/http://genome.ucsc.edu/
Gap free alignments33,970,427MGSC and HGBLength, percent identity Waterston et al. 2002; Schwartz et al. 2003 http://bio.cse.psu.edu/http://genome.ucsc.edu/
SNPs1,956,922dbSNP at NCBI and HGBType, allele, frequency Sherry et al. 2001 http://www.ncbi.nlm.nih.gov/SNP/http://genome.ucsc.edu/
Repeats4,891,898HGB and Repeat-MaskerName, class, family Kent et al. 2002; Smit and Green 1999 http://genome.ucsc.edu/http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker
CpG islands26,942HGBName Kent et al. 2002 http://genome.ucsc.edu/
Transcription factor binding sites7,655,424TRANSFAC, Cister, and tffindFactor name, strand, score Wingender et al. 2001; Frith et al. 2001 http://www.gene-regulation.com/pub/databases.html http://sullivan.bu.edu/∼mfrith/cister.shtml
Recombination rate8,475deCODE, Marshfield, Genethon and HGBMarker, recombination rate, range Kong et al. 2002; Browman et al. 1998; Hudson et al. 1995 http://www.decodegenetics.com/http://research.marshfieldclinic.org/genetics/Map_Markers/maps/IndexMapFrames.html http://www.genethon.fr/php/index_us.php http://genome.ucsc.edu/

[i] Note: Users query on fields such as those listed as examples in column 4. The number of entries for each field is subject to change as the source databases update their entries. For all categories except gene products and functions, the number of entries is simply a count of the number of rows in the database table. For gene products and function, the number of entries is the number of gene rows that have data in this category. NCBI, National Center for Biotechnology Information at NIH; OMIM, Online Mendelian Inheritance in Man; HGB, Human Genome Browser at UCSC; and MGSC, Mouse Genome Sequencing Consortium.