Annotation Statistics and Sources for Fields in the GALA Database
| Category | Entries | Source | Example fields from this category | Reference | URL |
| Genes | 35,535 | LocusLink at NCBI | Name, type, orientation, exons, coding | Pruitt and Maglott 2001 | http://www.ncbi.nlm.nih.gov/LocusLink/ |
| Genes | 865 | RefSeq at NCBI and HGB | Name, type, orientation, exons, coding | Pruitt and Maglott 2001 | http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://genome.ucsc.edu/ |
| Gene products and function | 17,388 | LocusLink at NCBI | Product, biological process, cellular component, molecular function, conserved domain | Pruitt and Maglott 2001 | http://www.ncbi.nlm.nih.gov/LocusLink/ |
| Expression data | 602,388 | UniGene at NCBI | Tissue | Wheeler et al. 2002 | http://www.ncbi.nlm.nih.gov/ |
| Genetic disorders | 2,802 | OMIM | Disorder | Hamosh et al. 2002 | http://www.ncbi.nlm.nih.gov/omim/ |
| Alternate gene model: Acembly genes | 123,238 | Acembly and HGB | Name, type, orientation, exons, coding | J. Thierry-Mieg et al., unpublished | http://www.acedb.org/Cornell/acembly/http://genome.ucsc.edu/ |
| Alternate gene model: Ensembl genes | 27,561 | Ensembl and HGB | Name, type, orientation, exons, coding | Hubbard et al. 2002 | http://www.ensembl.org/http://genome.ucsc.edu/ |
| Alternate gene model: Genscan genes | 42,737 | Genscan and HGB | Name, type, orientation, exons, coding | Burge and Karlin 1997 | http://genes.mit.edu/GENSCAN.html http://genome.ucsc.edu/ |
| Alternate gene model: RefSeq genes | 16,222 | RefSeq and HGB | Name, type, orientation, exons, coding | Pruitt and Maglott 2001 | http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://genome.ucsc.edu/ |
| Alternate gene model: Twinscan genes | 25,744 | Twinscan and HGB | Name, type, orientation, exons, coding | Korf et al. 2001 | http://genes.cs.wustl.edu/http://genome.ucsc.edu/ |
| Local alignments | 1,585,186 | MGSC and HGB | Length, percent identity, gap size, identity step | Waterston et al. 2002; Schwartz et al. 2003 | http://bio.cse.psu.edu/http://genome.ucsc.edu/ |
| Gap free alignments | 33,970,427 | MGSC and HGB | Length, percent identity | Waterston et al. 2002; Schwartz et al. 2003 | http://bio.cse.psu.edu/http://genome.ucsc.edu/ |
| SNPs | 1,956,922 | dbSNP at NCBI and HGB | Type, allele, frequency | Sherry et al. 2001 | http://www.ncbi.nlm.nih.gov/SNP/http://genome.ucsc.edu/ |
| Repeats | 4,891,898 | HGB and Repeat-Masker | Name, class, family | Kent et al. 2002; Smit and Green 1999 | http://genome.ucsc.edu/http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker |
| CpG islands | 26,942 | HGB | Name | Kent et al. 2002 | http://genome.ucsc.edu/ |
| Transcription factor binding sites | 7,655,424 | TRANSFAC, Cister, and tffind | Factor name, strand, score | Wingender et al. 2001; Frith et al. 2001 | http://www.gene-regulation.com/pub/databases.html http://sullivan.bu.edu/∼mfrith/cister.shtml |
| Recombination rate | 8,475 | deCODE, Marshfield, Genethon and HGB | Marker, recombination rate, range | Kong et al. 2002; Browman et al. 1998; Hudson et al. 1995 | http://www.decodegenetics.com/http://research.marshfieldclinic.org/genetics/Map_Markers/maps/IndexMapFrames.html http://www.genethon.fr/php/index_us.php http://genome.ucsc.edu/ |
-
Note: Users query on fields such as those listed as examples in column 4. The number of entries for each field is subject to change as the source databases update their entries. For all categories except gene products and functions, the number of entries is simply a count of the number of rows in the database table. For gene products and function, the number of entries is the number of gene rows that have data in this category. NCBI, National Center for Biotechnology Information at NIH; OMIM, Online Mendelian Inheritance in Man; HGB, Human Genome Browser at UCSC; and MGSC, Mouse Genome Sequencing Consortium.











