Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs

Table 2.

Functional Classification of Individual cDNAs

Cell cycle
cDNA data Best database hit Tissue specificity
Clone ID Accession no. Contig size (bp) ORF size (aa) Chromosomal location Description of best hit Database accession no. P-value Gene family Tissue Score # ESTs
DKFZp434A0530 AL136842 2768 254 2p22.1 gene:Borg2; product: “CRIB-containing BORG2 protein”;Homo sapiens CRIB-containing BORG2 protein (BORG2) mRNA, complete cds. EMBL AF164118 2.1e-99
DKFZp434A1135 AL122068 3010 670 5q13 Homo sapiens Rad 17-like protein (RAD17) mRNA, complete cds. EMBL AF076838 0
DKFZp434A1315 AL136755 1848 387 1q21.2 product: “F1N21.3”; The sequence of BAC F1N21 from Arabidopsis thaliana chromosome 1, complete sequence. EMBL AC002130 5.7e-22
DKFZp434B174 AL80146 1546 398 15q21.3 Homo sapiens mRNA for cyclin B2, complete cds. EMBL AB020981 0 ear 6.38 6
DKFZp434G0514 AL136750 1503 379 4p16.2 cell growth regulating nucleolar protein LYAR—mouse PIR A40683 2.7e-144
DKFZp434H152 AL136840 4619 855 10p13 gene:cdc23; “SPBC1347.10”; product: “cell division cycle protein 23”; S. pombe chromosome II cosmid c1347. EMBL AL035548 7e-21
DKFZp434J037 AL136891 3443 628 1q32.1 gene:KIAA0537; product: “KIAA0537 protein”; Homo sapiens mRNA for KIAA0537 protein, complete cds. EMBL AB011109 2.6e-148 protein kinase
DKFZp434N0250 AL117525 1584 462 1q43-q44 product: “AKT3 protein kinase”; Homo sapiens AKT3 protein kinase mRNA, complete cds. EMBL AF135794 2.1e-249 protein kinase
DKFZp434P107 AL136894 2380 422 9q34 XPMC2 protein—African clawed frog PIR S53818 5.9e-10
DKFZp434P2235 AL136860 2027 549 17q12 oncogene 1 (tre-2 locus) (clone 210)—human PIR S22155 5.5e-226 testis 5.81 12
DKFZp564A0723 AL80116 2524 712 6q14.3-q16.1 gene:ORC3L; product: “origin recognition complex ORC3L subunit”; Homo sapiens origin recognition complex ORC3L subunit (ORC3L) mRNA, complete cds. EMBL AF135044 0
DKFZp564E2182 AL50261 2367 204 6q22.1-q22.33 Homo sapiens CGI-98 protein mRNA, complete cds. EMBL AF151856 1.2e-265
DKFZp564G1816 AL136599 4775 984 3q12.2-q12.3 gene:KIAA0797; product: “KIAA0797 protein”; Homo sapiens mRNA for KIAA0797 protein, partial cds. EMBL AB018340 2.1e-50
DKFZp564K142 AL136636 2241 335 17p11.2 Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds. EMBL AF008554 9.4e-184
DKFZp564L0562 AL80090 941 185 4q31.21 Homo sapiens mRNA for APC10, complete cds. EMBL AB012109 4.4e-178
DKFZp564N0582 AL50264 1646 144 3p21.1 Homo sapiens DRR1 (DRR1) mRNA, complete cds. EMBL AF089853 0 brain 5.16 50
DKFZp564N0582 AL50264 1646 144 3p21.1 Homo sapiens DRR1 (DRR1) mRNA, complete cds. EMBL AF089853 0 retina 5.45 7
DKFZp566G0346 AL136719 4503 262 9q22.1 Homo sapiens spindlin mRNA, complete cds. EMBL AF106682 0
  • The cDNAs have been grouped into ten functional categories (see Statistics—Classification) based on sequence similarity data and have been grouped accordingly. The cDNA clones are available from the Resource Center of the German Genome project using the clone ID shown in the first column. The respective sequences are available at the EMBL/GenBank/DDBJ databases under the accession numbers shown in the second column. The third column provides the size of the individual cDNA inserts, and the fourth column shows the size of the encoded/predicted proteins. The chromosomal location of the respective genes is shown in the fifth column. Columns 6–8 describe database hits with the highest similarity: The accession number of the best hit (and the database where this hit was found), the description of the best hit, and the P-value of this hit is provided in these three columns, respectively. Similarities were predicted based on BLASTX and BLASTN2 analyses. Selection of the “representative = best” hit was done using the following criteria: (1) A BLASTX hit was judged better than a BLASTN hit. (2) In cases where the best BLASTX (only with TREMBL database) hit had been calculated from the same nucleotide sequence entry that was the best hit in the BLASTN analysis, the BLASTN hit is given, and (3) Only when no other hits were available, genomic sequence entries are given.

  • If classification of a protein to a major gene family was possible (based on similarity information), the respective family is shown in column 9. Based on the availability of EST information, tissue-specific expression of transcripts has been depicted in columns 10–13, showing the tissue, an arbitrary score (see WWW2001) and the absolute number of ESTs sequenced from that particular tissue (at the time of analysis), respectively.

  • This section is excerpted from the full table, available on-line at http://www.dkfz-heidelberg.de/abt0840/GCC.

This Article

  1. Genome Res. 11: 422-435

Preprint Server