Gene Expression Profiling in Human Fetal Liver and Identification of Tissue- and Developmental-Stage-Specific Genes through Compiled Expression Profiles and Efficient Cloning of Full-Length cDNAs
Abstract
Fetal liver intriguingly consists of hepatic parenchymal cells and hematopoietic stem/progenitor cells. Human fetal liver aged 22 wk of gestation (HFL22w) corresponds to the turning point between immigration and emigration of the hematopoietic system. To gain further molecular insight into its developmental and functional characteristics, HFL22w was studied by generating expressed sequence tags (ESTs) and by analyzing the compiled expression profiles of liver at different developmental stages. A total of 13,077 ESTs were sequenced from a 3′-directed cDNA library of HFL22w, and classified as follows: 5819 (44.5%) matched to known genes; 5460 (41.8%) exhibited no significant homology to known genes; and the remaining 1798 (13.7%) were genomic sequences of unknown function, mitochondrial genomic sequences, or repetitive sequences. Integration of ESTs of known human genes generated a profile including 1660 genes that could be divided into 15 gene categories according to their functions. Genes related to general housekeeping, ESTs associated with hematopoiesis, and liver-specific genes were highly expressed. Genes for signal transduction and those associated with diseases, abnormalities, or transcription regulation were also noticeably active. By comparing the expression profiles, we identified six gene groups that were associated with different developmental stages of human fetal liver, tumorigenesis, different physiological functions of Itoh cells against the other types of hepatic cells, and fetal hematopoiesis. The gene expression profile therefore reflected the unique functional characteristics of HFL22w remarkably. Meanwhile, 110 full-length cDNAs of novel genes were cloned and sequenced. These novel genes might contribute to our understanding of the unique functional characteristics of the human fetal liver at 22 wk.
[The sequence data described in this paper have been submitted to the GenBank data library under the accession nos. listed in Table 6 herein]
The liver is the largest gland in the human body. In addition to secreting bile, it functions in the metabolism of carbohydrates, fats, proteins, vitamins, and hormones. Hepatocytes undergo distinct phases of differentiation as they arise from the gut endoderm, coalesce to form the liver, and mature by birth.
Hematopoiesis occurs at three different primary sites during human embryonic and fetal development. It begins between day 15 and day 18 in the blood islands of the yolk sac. After 6 wk, hematopoietic stem cells (HSCs) migrate via the bloodstream to fetal liver (FL) and spleen, where erythropoiesis still predominates, but myeloid ontogenesis is also beginning. During the 20th wk of gestation, bone marrow hematopoiesis begins to occur, then becomes more and more myelopoietic, and finally represents the entire blood cell production. At the same time, hepatic and splenic hematopoietic activity decrease and disappear (Migliaccio et al. 1986; Tavassoli 1991; Huang and Auerbach 1993; Godin et al. 1995). The fetal liver at 22 wk of gestation (HFL22w) is a major site of fetal hematopoiesis in man, and is at the critical turning point between immigration and emigration of the hematopoietic system. Therefore, the unique characteristics of the fetal liver at this stage are worthy of investigation.
The diverse functions and complex regulation of HFL22w might be largely determined by well-regulated gene expression. Indeed, a number of important growth factors (Fausto 1991), transcription factors (Dabeva et al. 1995), and protein transportation regulators (Zhang et al. 2000) have been identified from HFL22w over the last two decades. Apart from classical factors, we recently cloned hepatopoietin (HPO) (Wang et al. 1999), a novel human hepatotrophic growth factor. It specifically stimulates proliferation of cultured primary hepatocytes in vitro, liver regeneration after liver partial hepatectomy in vivo, and autonomous growth of hepatoma cells by stimulation of the mitogen-activated protein kinase cascade and tyrosine phosphorylation of the epidermal growth factor receptor (Wang et al. 1999; Li et al. 2000). However, there are many unknown regulators and molecular signaling mechanisms, as well as the genetic control of fetal liver development to be explored. The mechanisms of migration, localization, and regulation of hematopoiesis at different stages of ontogeny are not well understood either.
The identification of genes of a given cell type, tissue, or corresponding to a pathological state that confer developmental or functional specificity will provide valuable molecular insight for the study of biological phenomena and cellular physiology. Like any specified tissue and cell population in the human body, the biological features of human fetal liver might be determined largely at the level of gene expression. Single-pass, partial sequencing of randomly selected cDNA clones from cDNA libraries to generate expressed sequence tags (ESTs) (Adams et al. 1991), combined with bioinformatics analysis, has proved useful for the discovery of novel genes (Adams et al. 1995), the characterization of gene function (Papadopoulos et al. 1994), the differential and quantitative analysis of expression patterns (Okubo et al. 1992), and for the evaluation of the gene expression profile in a given tissue (Adams et al. 1991, 1992; Okubo et al. 1992; Liew et al. 1994; Mao et al. 1998; Ryo et al. 1998; Sterky et al. 1998). It is obvious that the establishment of a detailed catalog of genes expressed in HFL22w, the discovery of novel genes from HFL22w, and identification of tissue- and developmental-stage-specific genes through compiled gene expression profiles will certainly facilitate our understanding of the mechanisms of coexistence of hepatic and hematopoietic systems in fetal liver and the regulation network of immigration/emigration of the hematopoietic system of fetal liver.
The present report is on the establishment of a gene expression profile of HFL22w based on the analysis of 13,077 ESTs as well as preliminary results of comparison of this expression profile with those of 10 different human cells or tissues associated with hepatic or hematopoietic systems, which are two major functional features of human fetal liver at the developmental stage of 22 wk of gestation. As a result, we found some tissue-specific and developmental-stage-specific gene groups that are likely to play important roles in some definite functional features.
RESULTS
cDNA Sequencing and General Data of ESTs from HFL22w
The HFL22w cDNA library had average insert sizes of 1.0–1.5 kb. By using automatic procedures for DNA sequencing, 14,400 clones were randomly picked up and sequenced partially from one end by using T7 or SP6 primer. Of them, 743 were considered trash, defined as sequences from bacterial DNA, sequences from primer polymers, sequences containing >1% of ambiguous bases (N), or sequences shorter than 100 bp; the other 13,077 sequences were considered good ones. The rate of successful sequences was therefore 90.8% and the average read-length for good sequences is 555 bp, which, to our knowledge, is among the best in the literature. Analysis of the 13,077 ESTs of satisfactory quality revealed three groups of sequences. Group I (5819 ESTs, 44.5%) matched to known genes in the GenBank nonredundant database and were considered labels of known functional genes, among which 5666 ESTs (43.3%) matched to human genes and the other 153 ESTs (1.2%) matched to previously described genes of other species. Group II (5460 ESTs, 41.8%) exhibited no significant homology to known genes, and 18.8% (1025 ESTs) of these overlapped EST sequences in the public database (dbEST). Group III (1798 ESTs, 13.7%) were genomic sequences of unknown function, mitochondrial DNA, or repetitive sequences.
Gene Expression Profile of Active Genes in HFL22w
A catalog of genes expressed in HFL22w was established by generating a large amount of ESTs, followed by bioinformatics analysis (data available through E-mail: hefc{at}nic.bmi.ac.cn). The uninformative sequences of Group III were put aside, and the remaining 11,279 ESTs of Group I and II were further analyzed and assembled into 1729 and 4768 clusters, respectively. After integration of overlapping sequences or sequences corresponding to different portions of the same gene, 5666 ESTs actually represented 1660 human genes and were summarized into 15 different functional categories in Table1. HFL22w ESTs were partitioned based upon biological roles and subcellular localization to include cell defense and homeostasis, cell division and regulation, cytokines and hormones, cytoskeleton, development, genes associated with diseases or abnormalities, gene/protein expression, hematopoiesis, liver and lipoproteins, metabolism, proteases and protease inhibitors, secretory proteins, signal transduction, transcription-related genes, and unclassified.
ESTs Distribution of HFL22w by Functional Categories
An expression profile of active genes in HFL22w is shown in Table2. In the list we can see several genes with certain frequency that could be expected based on the unique features and functions of HFL22w. First, in this developmental stage of human liver, cell proliferation and differentiation need a high productive level of protein synthesis as well as general metabolism, and a large amount of energy supply and protein synthesis is occurring. The genes expressed in HFL22w in the highest proportion were functionally related to the general housekeeping responsibilities of the cells, such as general metabolism, protein synthesis, and synthesis of nucleic acids and amino acids, which includes transcripts for various enzymes involved in the central reactions of metabolism, elongation factors, and ribosomal proteins, similar to EST databases previously generated from other tissues (Adams et al. 1995). Second, HFL22w is a major site of fetal hematopoiesis and immune development. ESTs associated with hematopoiesis formed the largest group of transcripts, for example, hemoglobins, globins, complement components, prothymosin-α, angiotensinogen, T-cell cyclophilin, and glycophorin A. Third, as expected, HFL22w highly expressed liver-specific genes such as serum albumin, fibrinogens, apolipoproteins, α-fetoprotein, haptoglobin, and high density lipoprotein-binding protein. In addition, genes for signal transduction, genes associated with diseases or abnormalities, and transcription-related genes were also noticeably active. Some cytokines and hormones such as insulinlike growth factor II (IGF-2), thymosin β-4, β-10, FGFR-4, lens epithelium-derived growth factor, megakaryocyte-stimulating factor, osteoclast-stimulating factor, and transforming growth factor (TGF) were also encountered in the EST data.
Expression Profile of Frequent Genes in HFL22w
Among 13,077 clones, 10.8% belong to two abundant transcripts, hemoglobin γ-G and serum albumin (HSA), which had 724 and 694 copies, respectively. Other frequent transcripts were ferritin light chain, H19 gene, retinol binding protein (RBP), α 1 globin, and so on. Besides serum albumin, some other liver-specific genes were detected also, including fibrinogen-β, -γ, and -α chains; apolipoprotein-B100, -AII, and -AI; albumin; α-fetoprotein (AFP); high density lipoprotein-binding protein; and heptoglobin α 2 and β subunits, which are known to be abundant in the liver. Fifty-eight species of ribosomal proteins (total 283 ESTs; 2.2%) were sequenced in 13,077 randomly selected clones. Because mammalian ribosomes are reported to be composed of ∼70–80 distinct proteins (Wool 1986), most of the ribosomal proteins seemed to be represented, suggesting that gene/protein expression was very active in fetal liver of this developmental stage.
Table 3 shows some of the 153 ESTs of 69 different transcript species matched to nonhuman sequences. Several ESTs were found to be similar to the genes differentially regulated during development. Some of them may turn out to be involved in signal transduction during the differentiation and proliferation of the fetal liver. Further characterization would be necessary to find out the actual biological roles of these candidates. Together with the 5460 Novel ESTs (representing 4768 EST clusters, Group II), we identified 4837 EST clusters whose biological functions were not completely known that could be good candidates for full-length cDNA cloning of novel functional genes.
ESTs Homologous to Nonhuman Sequences
Identification of Tissue- and Developmental-StageSpecific Genes by the Compilation of the Expression Profiles of HFL22w and the Other Functionally Associated Tissues or Cells
Although we were profiling the active genes in HFL22w based upon 13,077 ESTs, as yet the number of ESTs collected for each expression profile obtained from the published data was only approximately 1000. It was not possible to compare the genes that appeared at low abundance. However, with those genes whose transcripts appeared at high abundance and represent typical physiological and developmental status, relatively accurate comparisons could be made and the conclusion might even be more objective. Therefore, genes listed in the tables were extracted from each of these expression profiles, detected two or more times, and the abundance of their transcripts among total ESTs compiled. Through the comparison, several gene groups associated with definite physiological and/or molecular features were identified.
We collected five other liver-associated expression profiles including human fetal liver at 19 wk (HFL19w) or 40 wk (HFL40w) of gestation, human adult liver (HAL), Itoh cells, and HepG2 cells (http://bodymap.ims.u-tokyo.ac.jp/human_1.html) and compared them with the expression profile of HFL22w established here. We extracted 773 genes whose abundance was two or more in at least one of the six expression profiles and compiled their activities (EST frequency). Only the genes whose transcripts appeared 15 or more times in the compiled expression profile are shown in Table 4. These genes were categorized into three classes according to the number of libraries in which they were detected: ubiquitous—appeared in five or six origins (filled area in the Library column, lib); common—appeared in two–four origins (hatched area in the Library column), and unique—appeared in only one origin (blank in the Library column). The functions of a gene could be assumed from the frequencies in random isolates from the different libraries shown in the compiled expression profiles. Among the 773 genes, nine (Gene Group I) appeared ubiquitously (Table 5). Some of them were likely to function in housekeeping, such as the three ribosomal proteins. The other six genes were actually tissue-specific, function-keeping genes of liver, including serum albumin, ferritin L chain, and apolipoprotein AII.
Compiled Gene Expression Profile Associated with Liver
Classification of Gene Groups Associated with Liver Development
On the other hand, 636 genes appeared only in one library (Table 4, blanks in Library column). Because their relatively high expression was unique to one expression profile among the listed six, they were the candidate genes whose products exerted unique functions in Itoh cells, HepG2 cells, or the liver in the different stages of development, respectively.
Eleven genes were expressed only in HFL19w and HFL22w but not in HFL40w or HAL (Table 5, Gene Group II). They were α-fetoprotein (AFP), 23-kD highly basic protein, thymosin-4, insulinoma rig-analog mRNA encoding DNA-binding protein, and seven ribosomal proteins. Genes expressed only in HAL and HFL40w but not in HFL19w or HFL22w are also listed (Table 5, Gene Group III). They, together with the genes of Gene Group II (Table 5), are developmental-stage-specific genes, which are suitable candidates for molecular probes to characterize the developmental stage of fetal liver. Further analysis of them would give impetus to the research of the molecular mechanism of liver development.
We also identified two other gene groups through systematic analysis of the mRNA population differences between the normal cells and the tumor cells in the liver. Gene Group IV consists of the genes expressed only in the three fetal livers and the adult liver but not in the hepatoblastoma HepG2 cells (Table 5). These genes might be candidate tumor suppressor genes or genes that were inhibited during tumorigenesis. On the contrary, Gene Group V consisted of genes expressed only in the HepG2 cells but not in the normal liver in various developmental stages (data not shown). These genes might be associated with tumorigenesis of the liver. Six genes in Gene Group II (Table 5) such as α-fetoprotein (AFP); ribosomal proteins L9, L19, S3a, and L6; and insulinoma rig-analog mRNA encoding DNA-binding protein were expressed in HepG2 cells and human fetal liver in the early stage of development (age 19 and 22 wk of gestation) but not in HFL40w or HAL. Because tumor cells often express embryonic genes in abnormal ways, these six genes might represent oncogenic status in hepatoma cells.
Although Itoh cells are located in the liver, their gene expression profile was obviously different from those of hepatocytes at various developmental stages and of the hepatoma cell line HepG2. Out of 120 genes that had two or more EST copies in Itoh cells, 60 were not expressed in any of the five other liver-associated expression profiles. Genes commonly expressed with high levels in liver, such as serum albumin (ALB), fibrinogen, transferrin, apolipoprotein AI, and haptoglobin, were not detected in Itoh cells. The different expression profile of Itoh cells contributed to its different physiological function from other types of liver cells.
The compiled gene expression profile associated with hematopoiesis (data not shown) consisted of five gene expression profiles including the CD34+ hematopoietic progenitor/stem cell (Mao et al. 1998), CD4 T cell, CD8 T cell, granulocyte, and myeloblastic leukemia cell line HL60 cell (http://bodymap.ims.u-tokyo.ac.jp/human_1.html). They had 134, 38, 45, 20, and 46 genes that also expressed in HFL22w, respectively. It was obvious that the CD34+ hematopoietic progenitor/stem cell shared the most active genes with HFL22w. Among the 595 genes whose frequency was two or more in the expression profile of HFL22w, 134 (22.5%) genes were also expressed in CD34+ hematopoietic stem/progenitor cells. Some of them were hematopoietic system-specific, for example, hemoglobin γ-G (HMG), β-globin, and T-cell cyclophilin. But the similarity between the expression profile of HFL22w and granulocytes was much less. This result matched the fact that there were few differentiated granulocytes in HFL22w.
Full-Length cDNA Cloning from HFL22w
Based on the bioinformatics analysis, 110 EST clusters have been chosen initially for full-length cDNA cloning. The clone inserts were sequenced with end-sequencing, primer extension, and sequencing after partial deletion/subcloning. After assembling ESTs into contigs, we found that 74 (67.3%) of the 110 cDNA clones already contained a complete open reading frame (ORF). In the other 36 cDNA clones, an obvious but incomplete reading frame was present. In silico cloning with dbEST extension allowed us to obtain 22 (20.0%) putative entire ORFs, which were then confirmed by sequencing of material cDNA clones obtained by appropriately designed RT-PCR. For the remaining 14 (12.7%) cDNA clones that could not be extended properly with an electronic approach, rapid amplification of cDNA ends (RACE) was applied to get the 5′ or 3′ ends from appropriate tissue origins. In total, 110 cDNAs with putatively entire ORFs were obtained. Table6 shows all 110 new full-length cDNAs from HFL22w. Among these 110 full-length cDNAs, 71 contained multiple exons and 87 had a consensus polyadenylation signal near the 3′ end; the 14 polyA tails might correspond to an A-rich region of the genome when they were searched against GenBank's working draft of the human genome. It is worth pointing out that, although a polyadenylation signal was found in the majority (73/110) of cDNAs as evidence of containing the complete 3′ UTR, the integrity of the 5′ UTR needs further experimental confirmation as in reports like that of the RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium (Kawai et al. 2001). Among these novel genes, the majority, 76 (69.1%), encode 80–500 amino acid residues deduced from their encoding frames. According to their homology with known genes and domains, some genes might be associated with signal transduction, such as the human homolog of mouse c-Jun leucine zipper interactive protein (cDNA JZA-20), the Kluyveromyces lactis transcription initiation factor IIIB 70-kD subunit, or Bos taurus guanine nucleotide-binding protein. And some genes might be new members of certain gene families, for example, the gene for the human homolog ofSchizosaccharomyces pombe Arf GTPase-activating protein, now termed human ADP-ribosylation factor GTPase-activating protein (ARFGAP3), belonging to the ARF GAP family (Zhang et al. 2000; Liu et al. 2001). In addition, some genes are very conserved in the species' evolution because their encoded proteins exhibit similar primary structure with those derived from such organisms as Arabidopsis thaliana, Schizosaccharomyces pombe, Kluyveromyces lactis, Plasmodium chabaudi, Tetrahymena thermophila, Caenorhabditis elegans, Drosophila melanogaster (Table 6), and other mammals (Qu et al. 2001). These novel genes might be involved in critical biological processes according to their homology to known genes with established significant functions like signal transduction, metabolism, protein expression, and hematopoiesis.
List of the Full-Length cDNA from HFL22w and Their Homologous Genes
In further investigations, the chromosomal localization of 77 novel genes was determined, 70 of which were located by using database information of UniGene, dbSTS, dbHTGS, and Human Chromosome Databases; the location of the other 7 was determined by radiation hybrid (RH) mapping. The remaining 33 novel genes could not be mapped by either of the above methods.
DISCUSSION
The major objective of the human genome project is the identification of the complete set of human genes. Single-pass, partial sequencing of cDNA clones in different organs, tissues, or cells of the human body is complementary to the genomic DNA sequencing. The analysis of ESTs generated from cDNA libraries has been shown to provide an extensive and quantitative measure of the transcriptional activity of expressed genes (Adams et al. 1991; Okubo et al. 1992). Here we have undertaken the EST sequencing of the cDNA library of HFL22w as the first step of a long-term effort to explore the genes expressed in this specific developmental stage of human fetal liver. A preliminary profile of gene expression in this cell population was set up based on the analysis of 13,077 ESTs.
Current estimates place the total number of genes in the human genome at about 30,000 (Lander et al. 2001; Venter et al. 2001). The portion of the genome expressed in any given cell type or tissue is not precisely known. The mRNAs from most genes are at low levels and from a smaller number of genes at intermediate levels of expression. Only a few genes are expressed at high levels (Sargent 1987). The highly abundant species are often tissue-specific, and the majority of the rare messages are shared among all tissues examined, implying a housekeeping function (Bishop et al. 1974). As expected, gene categories IX (liver and lipoproteins) and VIII (hematopoiesis) consisted of tissue-specific and stage-specific genes of HFL22w. These two gene categories have 22 highly expressed genes, about one-third of the total abundant species. Meanwhile, two gene categories—X (metabolism) and VII (gene/protein expression)—which included most of the housekeeping genes, had 30.9% (445/1442) of the genes expressed at low levels, whose frequency is equal to or less than 3.
Our initial goal was to gain a broad understanding of both the diversity and the abundance of gene expression in HFL22w. HFL22w has its tissue-specific and stage-specific functions. In the liver of a human fetus, besides the general metabolism of carbohydrates, fats and proteins, hematopoiesis, which originated in the yolk sac, occurs in the liver from the 6th wk to the 7th month of gestation. After the immigration of the hematopoietic system into the fetal liver at 2 months of gestation, human fetal liver gradually becomes a major site of embryonic hematopoiesis, and, intriguingly, coexistence of hepatic and hematopoietic systems appears. Moreover, at 22 wk of gestation, human fetal liver displays the balance of immigration and emigration of the hematopoietic system. Therefore, HFL22w is an excellent model for unraveling the mechanisms of interaction between hepatic and hematopoietic systems and of immigration and emigration of the hematopoietic system during mammalian development, and is a suitable resource for identification of novel significant genes.
Although gene activities were not simply reflected by the abundance of various mRNAs, gene expression profiling leads to the best approximation about them. Because there was a satisfactory representation of ESTs generated from HFL22w, the gene expression profile could be analyzed in terms of both patterns and levels. The profile dramatically reflected the hepatic and hematopoietic activities of HFL22w as described above. The quantitative ratios should help us understand its major functional feature. For instance, the mRNA of hemoglobin γ-G was the most abundant mRNA in HFL22w, which had 724 EST copies. Considering that it plays a pivotal role in hematopoiesis, its high abundance in expression profiling of HFL22w strongly indicated that HFL22w was a major site of embryonic hematopoiesis and that the expression profiling of HFL22w reported here could objectively represent the molecular features of human fetal liver. Hemoglobin is composed of four kinds of polypeptide chains, each of which is the product of a specific gene. Choi et al. (1995) reported the appearance of adult-type hemoglobin (hemoglobin β) and concluded that the transition of hemoglobin type from fetal to adult form has already begun in the 22-wk-old fetal liver before the bone marrow takes over the hematopoietic function. However, we found the appearance of embryonic-type hemoglobin (hemoglobin ζ) but no hemoglobin β in HFL22w. This showed that the transition of hemoglobin type from fetal to adult form had not yet begun and the transition of hemoglobin type from embryonic to fetal form had not completely finished at this stage. In addition, serum albumin had 694 EST copies in our profiling. It has been known as a main component for maintaining the colloid-osmotic pressure of plasma, as well as for binding bilirubin or lipids for eventual excretion. It could therefore be concluded that albumin synthesis, the typical liver-specific function, has begun in HFL22w. These results showed that the typical fetal liver functions of either hepatic biochemical metabolism or hematopoiesis were maintained through high rates of transcription of specific genes. Meanwhile, since the number of sequenced clones was large enough, it is possible to identify those genes with low level expression, or those with unknown functions. Actually, hepatopoietin (HPO) (Wang et al. 1999; Li et al. 2000) expression was detected in HFL22w, indicating that it may also function in fetal liver development. Through the comparison of the liver-associated expression profiles, we found 11 genes only expressed in the fetal liver during the early stage of liver development, which might be tissue-specific and stage-specific. Of them, α-fetoprotein (AFP) was highly expressed as expected. It was a serum glycoprotein normally present in high concentration in fetal and maternal serum but in low concentration in normal adult liver (Kew 1990). As the most typical liver oncodevelopmental protein, reappearance of AFP in high concentrations in adulthood is a strong pointer to the diagnosis of hepatocellular carcinoma, and in childhood to either hepatoblastoma or hepatocellular carcinoma. 23-kD highly basic protein is a protein whose precise physiologic function is unknown. As a kind of thymic hormone, thymosin β-4 is necessary for differentiation of stem cell precursors into mature cells (Kamani and Douglas 1991). The expression of thymosin β-4 in early fetal liver confirmed that during the 22 wk of gestation, human fetal liver was actually a major site of embryonic immune development. Insulinoma rig-analog mRNA encodes a DNA-binding protein, and the deduced 145-amino acid sequence remains invariant in hamster, human, and rat insulinomas, suggesting thatrig has evolved under extraordinarily strong selective constraints (Inoue et al. 1987). rig was found to be expressed in rat regenerating liver and in rat primarily cultured hepatocytes. The level of rig mRNA was increased at the proliferative phase of liver regeneration. In synchronously cultured hepatocytes, therig mRNA level was elevated at the G1 phase of the cell cycle and the rig protein accumulated in the nuclei during the S phase (Inoue et al. 1988). These results indicate that rig, and the insulinoma rig-analog mRNA expressed in the early stage of development of human fetal liver, could be involved in a more general way in growth or cell proliferation.
The timing course of the successive developmental processes is one of the most fundamental aspects of ontogenesis. The liver development during various stages was apparently under the control of sequential gene expression as the dominant, though perhaps not exclusive, mechanism. Therefore, single-pass sequencing of randomly selected cDNAs, which is a rapid and efficient method for discovering new transcripts and for expression profiling the active genes, with consequent comparison of the profiles for determining patterns of gene expression during the different stages of liver development, did help us understand more about the functional features of HFL22w and identify gene groups consisting of candidate genes playing important roles during human liver development.
Actually, through the comparison of the expression profiles, we found that along with the development of the liver (from HFL19w to HAL), the expression level of translationally controlled tumor protein (TCTP) and its rank position of expression frequency among all the genes expressed in the tissues obviously dropped. In comparison, the expression level of TCTP in HepG2 cells was conversely very high and close to that of the fetal liver at early developmental stages (Table7). Therefore, TCTP may be a dedifferentiation marker of liver or hepatocytes.
Expression Pattern of Translationally Controlled Tumor Protein
Generally speaking, most of the highly expressed genes have already been identified. So far, a large number of human genes have been labeled by dbESTs, and the proportion could be even higher in the databases of some genomic industries. However, the poor representation of some important genes in dbEST indicates that completion of the list of human genes, especially those with low-level expression or temporally and/or spatially restricted expression, needs continuous effort. Therefore, the Group II ESTs (5460), accounting for 41.8% of all ESTs obtained, are worth paying particular attention to in the future discovery of novel genes. Based on the novel ESTs and the homologous ESTs with nonhuman matches identified in HFL22w and taking advantage of the UniGene information in public databases and the available rapid amplification of cDNA ends PCR technology, we cloned 110 full-length cDNAs of novel genes. The tools of bioinformatics not only help to clone novel genes through dbEST assembly, but also provide important clues to the function of novel genes through comparison of homology of known genes with established functions and those genes from model organisms. Among the 110 novel genes, we have found that at least 4 may participate in signal transduction and that 8 genes were similar to the D. melanogaster genes predicted based on the genome sequence of D. melanogaster (Adams et al. 2000). However, to systematically characterize these genes involved in the molecular mechanism of fetal liver development, embryonic hematopoiesis, and tumorigenesis, several approaches, such as microarray and yeast two-hybrid system technologies, should be used in grouping analysis of gene expression kinetics and protein interaction in human fetal liver.
METHODS
DNA Sequencing
Bacteria growth and plasmid extractions of the HFL22w cDNA library (CLONTECH) were performed by a QIAprep 96 Turbo Miniprep Kit (QIAGEN). Sequencing reactions were performed on a GeneAmp PCR System 9700 thermal reactor (Perkin-Elmer) by using a BigDye Terminator Cycle Sequencing Kit (Perkin-Elmer) with T7 or SP6 primers. After removing the unincorporated dye terminators from sequencing reactions with DyeEx Spin Kits (QIAGEN), the reaction products were electrophoresed on an ABI 377-XL DNA sequencer (Perkin-Elmer–Applied Biosystems), and raw sequence data were automatically recorded.
Data Management and Bioinformatics Analysis
Sequences were edited manually by using PHRED andSequencher (version 3.0) to remove vector sequence and identify trash sequences, defined as sequences from bacterial DNA, sequences from primer polymers, sequences containing >1% of ambiguous bases (N), or sequences shorter than 100 bp. All sequence data were preserved on record tape. An in-house database for EST sequences generated from a cDNA library of HFL22w was established. The individual ESTs were searched against the GenBank nonredundant database (Release 105.0) for homology comparison by using BLASTN on theBLAST network server at the National Center for Biotechnology Information (NCBI). ESTs with an Alignment Score of the Basic Local Alignment Search (BLAST) >200 were considered to identify known genes or to have partial homology to known genes; the others were considered novel ones. Clustering of the ESTs generated in this work was performed by using PHRAP with default parameters.
Full-Length cDNA Cloning
The new sequences, considered as part of novel genes, confirmed by similarity searching against GenBank, were selected for full-length cDNA cloning. The program ORF Finder(http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was applied to analyze the open reading frames. For those clones containing partial reading frames, in silico cloning and RACE were performed. In silico cloning was carried out using dbEST information, starting from the sequences obtained from the HFL22w cDNA library and then confirming these by sequencing of material cDNA clones obtained by appropriately designed RT-PCR. Sequence ambiguity existing in these contigs was clarified by further sequencing. A Smart RACE cDNA Amplification Kit (Clontech) was used to facilitate full-length cDNA cloning.
Genomic Mapping of Full-Length cDNA Clones
The chromosomal assignment of novel genes was mapped by two strategies: searching sequence databases such as Unigene, dbSTS, Human Chromosome Databases, dbHTGS at the National Center for Biotechnology Information; or radiation hybrid (RH). The Genebridge 4 RH panel (Research Genetics) was used in RH mapping.
Acknowledgments
This work was partially supported by the Chinese National Key Project of Basic Research, the Chinese National High-tech Program, the Chinese National Distinguished Young Scholar Awards, the Chinese National Natural Science Foundation Key Project, and the Beijing City Municipal Key Project.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.
Footnotes
-
↵1 Corresponding author.
-
E-MAIL hefc{at}nic.bmi.ac.cn; FAX 86-10-68214653.
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.175501.
-
- Received December 15, 2000.
- Accepted May 14, 2001.
- Cold Spring Harbor Laboratory Press











