|
|
|
|
Published online before print
January 31, 2006, 10.1101/gr.4303406 Genome Res. 16:405-413, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Methods A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles1 Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA 2 Department of Biomedical Engineering, Washington University, St. Louis, Missouri 63130, USA 3 Department of Pathology and Immunology, Division of Laboratory Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA
An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (2) the identification of genes that are regulated by one or several transcription factors. In this study, a systematic and statistical approach was taken to accomplish these tasks by establishing an integrated model considering all of the promoters and characterized transcription factors (TFs) in the genome. A promoter analysis pipeline (PAP) was developed to implement this approach. PAP was tested using coregulated gene clusters collected from the literature. In most test cases, PAP identified the transcription regulators of the input genes accurately. When compared with chromatin immunoprecipitation experiment data, PAP's predictions are consistent with the experimental observations. When PAP was used to analyze one published expression-profiling data set and two novel coregulated gene sets, PAP was able to generate biologically meaningful hypotheses. Therefore, by taking a systematic approach of considering all promoters and characterized TFs in our model, we were able to make more reliable predictions about the regulation of gene expression in mammalian organisms.
Gene expression is largely regulated by transcription factors (TFs) that recognize specific sequences, called cis-regulatory elements or TF-binding sites, in promoters. One of the ultimate goals of biological research is to construct the entire regulatory network of an organism (Covert et al. 2004
Computational approaches for identifying the transcriptional regulators of a particular gene are greatly enhanced by large-scale expression-profiling experiments and sequence analysis of multiple genomes. Genome-wide mRNA-profiling experiments allow the identification of genes that have similar expression patterns. As coexpressed genes are likely to be regulated by the same TFs, it is thought that the analysis of noncoding sequences of coexpressed genes will be useful in identifying common cis-regulatory elements recognized by known or novel TFs. These methods have been successfully applied to simple organisms such as yeast and worm (Hughes et al. 2000
Two important questions often encountered in biological studies regarding transcriptional regulation include the following: (1) Find the common transcriptional regulators of a set of genes that are involved in the same biological pathway, in the same cellular process, in response to the same stimulus, or in the same disease. (2) Find genes that are regulated by one or several TFs that have important roles in a particular biological function or a pathophysiological process. To answer the first question, previous studies have utilized statistical methods to test the enrichment of a TF's binding site in a set of coregulated genes against a "reference" set such as randomly selected genes in the genome (Aerts et al. 2003
While experimental methods such as chromatin immunoprecipitation, followed by promoter microarray (Lee et al. 2002
From the viewpoint of systems biology, the transcriptional regulatory network of an organism consists of all of the genes, including all of the TFs, and all network interactions between the genes and their transcriptional regulators. With the ever-increasing number of completely sequenced genomes and better annotation of transcription factors in the genome, it is now possible to take a systematic and statistical approach to establish an integrated model considering all of the genes and all characterized TFs in the genome. Such a model would allow one to make robust statistical inferences about transcriptional regulation. Specifically, this model would allow one to answer the two important questions mentioned above and would reliably assign the statistical significance of the findings. In this study, we present such a model and demonstrate its utility to analyze the potential regulatory sequences of a set of coexpressed genes in mammalian genomes and to make predictions regarding their regulatory mechanisms. We implemented this proposed model in a Web-based workbench termed the Promoter Analysis Pipeline (PAP). PAP is suitable for predicting transcriptional regulators of a set of genes and for identifying the target genes of a set of transcription factors. Various tests, including the analysis of coregulated gene sets collected from the literature, comparison with the chromatin immunoprecipitation experiment data, and the analysis of a published time-course expression-profiling data set indicated the robustness and accuracy of PAP. Therefore, PAP is useful in making reliable predictions about the regulation of gene expression. PAP is available at http://bioinformatics.wustl.edu/PAP.
PAP overview The design of PAP includes two components (Fig. 1). The data-processing pipeline was assembled using a series of algorithms and data manipulation tools. This set of applications was used to carry out genome-wide promoter analysis, namely, orthologous sequence alignment, TF binding-site identification, and promoter score calculation. The calculated results were stored in a relational database termed the Promoter Analysis Pipeline Database (PAPdb). The graphical user interface of PAP includes a set of interactive Web pages. These pages allow the user to input a set of potentially coregulated genes, to identify a set of transcription factors that are most likely to regulate these genes, to browse binding sites of these TFs, and to predict other genes that might be regulated by the same set of TFs. This bipartite design of PAP uncouples the majority of the computation from the user interface. Therefore, PAP is able to return results of genome-wide promoter analyses in real time. Details of methods and algorithms used in PAP are described in the following sections.
Curation of potential regulatory sequences
Using this definition, promoter sequences were retrieved from the Genome Assembly Project of the National Center for Biotechnology Information (NCBI). Since some alternatively spliced transcripts might have the same promoter sequence according to our definition (e.g., alternatively spliced exons did not change the position of the transcription or translation start site), 22,276 and 21,089 distinct promoter candidates were collected for human and mouse, respectively (Table 1). In the current model, two transcripts of the same gene locus are treated separately if they have different promoter candidates. Therefore, they will have different statistical scores. Although TF-binding sites within interspersed repetitive sequences might be functional (Zhou et al. 2002
Identification of conserved sequences The basic assumption of phylogenetic footprinting is that most functional regulatory elements or TF-binding sites are conserved through evolution. As such, although functional elements may indeed exist in nonconserved sequence, they are most likely to be found in regions of sequence conservation in promoters of multiple species. To identify such conserved regions, orthologous genes for each gene locus were identified using NCBI's HomoloGene database (see Methods). Although genes in some of these ortholog groups may not be true orthologs, aligning the promoters of these genes may be informative to identify functional elements. For each ortholog group, we then aligned promoters of human and mouse gene loci using the program TBA (Blanchette et al. 2004
In the promoter regions being studied, the most conserved segment is around 2 kb upstream and downstream of the annotated transcription start sites (Fig. 2B), with 20% of the sequence alignable, on average. The average G/C content of all the human promoters at each position across the sequence range stored in PAP was calculated. The region between 570 bp and +730 bp was G/C rich, with the G/C content increasing nearer to the transcription start site from both directions (data not shown). These global analyses of alignable sequences and G/C content are comparable to other genome-wide promoter studies (Wasserman et al. 2000
Identification of conserved TF-binding sites
The probabilistic framework of PAP
ln 2, those in the top 10% have R-score ln 10, those in the top 1% have R-score ln 100, and so on. Furthermore, summing R-scores for several promoters is equivalent to multiplying the probabilities of their ranks, which provides a convenient means of determining the significance of the binding scores for sets of promoters or sets of TFs.
PAP's performance on experimentally verified TF-binding sites
Testing the statistical significance of PAP's findings To evaluate the reliability of PAP's predictions, the statistical significance of PAP's findings was determined using randomly generated data sets. Genes were randomly selected from all 14,140 human genes that had a mouse ortholog and whose promoters were stored in PAP. The probability of observing a similar score or higher by chance was determined empirically from the distribution generated using these randomly selected gene sets:
Comparison of PAP's prediction with chromatin immunoprecipitation experiment data
Prediction of genes regulated by a set of transcription factors
Application of PAP to a published expression profiling experiment data To demonstrate the usefulness of PAP to analyze multiple gene clusters identified by mRNA expression-profiling experiments, and to identify the underlying transcriptional regulatory events, we applied PAP to a published expression-profiling data set (Tomczak et al. 2004
Application of PAP to a novel cell proliferation-related gene cluster
When PAP was used to analyze these genes, known transcriptional regulators of cell cycle regulatory genes including NF-Y (P =
As a complimentary study, we used PAP to analyze another cell-proliferation signature previously identified in a different study (Chang et al. 2004
Application of PAP to cholesterol biosynthesis pathway genes When 11 cholesterol synthesis genes (Supplemental Table 7) with annotated human and mouse orthologs were analyzed using PAP, several known transcription regulators of cholesterol synthetic enzymes, including NF-Y, CREB, and YY1 (Supplemental Table 3) were identified with low P-values (Table 4). Although only three cholesterol synthetic enzymes have been shown to be directly regulated by CREB, PAP's result indicates that CREB may directly regulate other cholesterol synthesis genes as well.
Interestingly, PAP did not predict Egr2 as one of the top-ranking transcription factors (Egr matrix ranks the 38th). This implied that Egr2 may not be a direct transcriptional regulator of most of these enzymes. To investigate whether any of the high-scoring transcription factors predicted by PAP may be mediating the regulation of cholesterol synthetic genes by Egr2, the R-scores and P-values of genes encoding these factors were calculated using Egr2 as the transcription factor. Four of these transcription factor genes (NFYA, CREB1, YY1, and AP1) have overrepresented Egr-binding sites with low P-values (P < 0.05). Furthermore, when these four TF genes were collected as a gene cluster and analyzed by PAP, Egr2 matrix had a very low P-value of 0.0043, which indicated that NFYA, CREB1, YY1, and AP1 are likely to be regulated by Egr2. These results suggest a model of transcriptional regulation of cholesterol biosynthesis genes in Schwann cell, in which Egr2 does not directly regulate all of the cholesterol synthesis genes, but instead, coordinates cholesterol synthesis required for myelination through transcription factors, NF-Y, CREB, YY1, and/or AP1 (Fig. 6).
In this study, a systematic and statistical approach was taken to establish a genome-wide promoter analysis model. The entire collection of promoters in the genome serves as a natural background for statistical analysis. Our model was tested using previously identified coregulated gene clusters, as well as many other data sets. In all of these tests, PAP performed robustly and was able to make reliable predictions about transcriptional regulatory mechanisms.
When tested using previously characterized coregulated gene clusters, PAP predicted the experimentally verified transcriptional regulators accurately. In addition, PAP also identified other TFs that may interact with them. For example, in the analysis of muscle-specific Myf target genes, E2A was also predicted as a high-scoring factor besides MyoD, and the hetero-oligomerization of E2A with MyoD is required for MyoD's function in muscle (Lassar et al. 1991 B immune genes, where the top-ranking transcription factor, c-Rel, is known to interact with NF- B (Miyamoto et al. 1994
While the current version of PAP has proven to be a useful tool for discovering and exploring regulatory networks, new data and enhanced analysis methods will provide further improvements. The two types of data that PAP utilizes, comparative genome sequences, and transcription-factor binding models are rapidly accumulating and will lead to improved analyses. In this study, only the mouse and human genomes were utilized, and it was shown that conservation was valuable for identifying true regulatory sites. The genomes of additional mammals and other vertebrates are now completed or in progress, and we expect that they will add useful information and allow for a more thorough investigation of distant regulatory regions. While TRANSFAC is the most comprehensive of transcription factor databases, it is far from complete, containing binding sites for only a fraction of the known and putative transcription factors. For many of the factors that are included, too few sites are known to build reliable models of their specificity and make accurate predictions of their binding sites throughout the genome. But new technologies, such as microarrays and ChIP-chip experiments (Lee et al. 2002 The analysis methods can also be improved to better take into account important correlations in the data. For example, currently, TF sets can be used to identify potentially coregulated genes. However, if two TFs have very similar binding profiles, they will have similar scores on any given promoters, which may confound the analysis. This issue may be resolved by considering constraints between the TFs, such as limited ranges of spacing or orientation, as well as other correlations that may indicate cooperative interactions. And when considering sets of genes, or sets of TFs, R-scores are tabulated and averaged over the entire set, which may miss important subsets with significant matches. Efficiently determining such significant subsets, and accurately assessing their P-values, is computationally challenging, and we are currently exploring techniques to accomplish the task. This will provide PAP with a much richer ability to discover important regulatory features in the genome sequences.
Promoter data preparation Human and mouse chromosomal sequences and gene-annotation files were downloaded from the NCBI's Genome Assembly Project through their FTP site (ftp://ftp.ncbi.nih.gov/genomes/). Genome build 34 was used for human and genome build 32 was used for mouse. For each mRNA, the promoter sequence was obtained from the genomic sequence using the mRNA and coding start positions. Repetitive elements in promoter sequences were masked by the program RepeatMasker (http://www.repeatmasker.org/) using slow and sensitive search mode.
Ortholog groups' identification
TF-binding sites' identification
A total of 466 vertebrate matrices from TRANSFAC 7.2 and 79 vertebrate matrices from JASPAR were searched in the promoter sequences. The average G/C content of all human and mouse promoters, 46.5%, was used as the background base frequency of G/C. Promoters of orthologous genes were aligned using the program TBA (Blanchette et al. 2004
Probability scores and R-scores
Based on the probability score, the R-score of a promoter for a TF is computed by equation 1. For a set of n promoters, the average R-score, <R-score>, is calculated by
We thank Kai Tan and Jia-Jian Liu for useful discussions. We thank Deepak Kapur, Srikanth Adiga, Divyabhanu Singh, Aarti Sharma, and Sai Krishna Chitta for help in setting up the database and the Web server. L.C and G.D.S. are supported by the National Institutes of Health (NIH) grants HG00249 and GM63340. J.A.M. and J.M. are supported by the Prostate Cancer Foundation.
[Supplemental material is available online at www.genome.org.] Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4303406.
4 Corresponding author.
Aerts, S., Thijs, G., Coessens, B., Staes, M., Moreau, Y., and De Moor, B. 2003. Toucan: Deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31: 17531764. Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., and Mango, S.E. 2004. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305: 17431746. Baeuerle, P.A. and Baichwal, V.R. 1997. NF- Berg, O.G. and von Hippel, P.H. 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193: 723750.[CrossRef][Medline] Blanchette, M. and Tompa, M. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12: 739748. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708715. Bluthgen, N., Kielbasa, S.M., and Herzel, H. 2005. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 33: 272279. Chang, H.Y., Sneddon, J.B., Alizadeh, A.A., Sood, R., West, R.B., Montgomery, K., Chi, J.T., van de Rijn, M., Botstein, D., and Brown, P.O. 2004. Gene expression signature of fibroblast serum response predicts human cancer progression: Similarities between tumors and wounds. PLoS Biol. 2: E7.[CrossRef][Medline] Cole, S.W., Yan, W., Galic, Z., Arevalo, J., and Zack, J.A. 2005. Expression-based monitoring of transcription factor activity: The TELiS database. Bioinformatics 21: 803810. Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J., and Palsson, B.Ø. 2004. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429: 9296.[CrossRef][Medline] Duan, Z. and Horwitz, M. 2003. Targets of the transcriptional repressor oncoprotein Gfi-1. Proc. Natl. Acad. Sci. 100: 59325937. Elkon, R., Linhart, C., Sharan, R., Shamir, R., and Shiloh, Y. 2003. Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res. 13: 773780. GuhaThakurta, D., Palomar, L., Stormo, G.D., Tedesco, P., Johnson, T.E., Walker, D.W., Lithgow, G., Kim, S., and Link, C.D. 2002. Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. Genome Res. 12: 701712. GuhaThakurta, D., Schriefer, L.A., Waterston, R.H., and Stormo, G.D. 2004. Novel transcription regulatory elements in Caenorhabditis elegans muscle genes. Genome Res. 14: 24572468. Helledie, T., Grontved, L., Jensen, S.S., Kiilerich, P., Rietveld, L., Albrektsen, T., Boysen, M.S., Nohr, J., Larsen, L.K., Fleckner, J., et al. 2002. The gene encoding the Acyl-CoA-binding protein is activated by peroxisome proliferator-activated receptor Ho Sui, S.J., Mortimer, J.R., Arenillas, D.J., Brumm, J., Walsh, C.J., Kennedy, B.P., and Wasserman, W.W. 2005. oPOSSUM: Identification of overrepresented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33: 31543164. Hock, H., Hamblen, M.J., Rooke, H.M., Schindler, J.W., Saleque, S., Fujiwara, Y., and Orkin, S.H. 2004. Gfi-1 restricts proliferation and preserves functional integrity of haematopoietic stem cells. Nature 431: 10021007.[CrossRef][Medline] Hu, Y., Wang, T., Stormo, G.D., and Gordon, J.I. 2004. RNA interference of achaete-scute homolog 1 in mouse prostate neuroendocrine cells reveals its gene targets and DNA binding sites. Proc. Natl. Acad. Sci. 101: 55595564. Hughes, J.D., Estep, P.W., Tavazoie, S., and Church, G.M. 2000. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296: 12051214.[CrossRef][Medline] Jin, V.X., Leu, Y.W., Liyanarachchi, S., Sun, H., Fan, M., Nephew, K.P., Huang, T.H., and Davuluri, R.V. 2004. Identifying estrogen receptor Karanam, S. and Moreno, C.S. 2004. CONFAC: Automated application of comparative genomic promoter analysis to DNA microarray data sets. Nucleic Acids Res. 32: W475W484. Kel, A.E., Kel-Margoulis, O.V., Farnham, P.J., Bartley, S.M., Wingender, E., and Zhang, M.Q. 2001. Computer-assisted identification of cell cycle-related genes: New targets for E2F transcription factors. J. Mol. Biol. 309: 99120.[CrossRef][Medline] Kellis, M., Patterson, N., Endrizzi, M., Birren, B., and Lander, E.S. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423: 241254.[CrossRef][Medline] Krivan, W. and Wasserman, W.W. 2001. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 11: 15591566. Lassar, A.B., Davis, R.L., Wright, W.E., Kadesch, T., Murre, C., Voronova, A. Baltimore, D., and Weintraub, H. 1991. Functional activity of myogenic HLH proteins requires hetero-oligomerization with E12/E47-like proteins in vivo. Cell 66: 305315.[CrossRef][Medline] Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., et al. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298: 799804. Liu, R., McEachin, R.C., and States, D.J. 2003. Computationally identifying novel NF- Louie, E., Ott, J., and Majewski, J. 2003. Nucleotide frequency variation across human genes. Genome Res. 13: 25942601. Magee, J.A., Abdulkadir, S.A., and Milbrandt, J. 2003. Haploinsufficiency at the Nkx3.1 locus. A paradigm for stochastic, dosage-sensitive gene regulation during tumor initiation. Cancer Cell 3: 273283.[CrossRef][Medline] Mathew, S., Mascareno, E., and Siddiqui, M.A. 2004. A ternary complex of transcription factors, Nished and NFATc4, and co-activator p300 bound to an intronic sequence, intronic regulatory element, is pivotal for the up-regulation of myosin light chain-2v gene in cardiac hypertrophy. J. Biol. Chem. 279: 4101841027. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., et al. 2003. TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31: 374378. Miyamoto, S., Schmitt, M.J., and Verma, I.M. 1994. Qualitative changes in the subunit composition of Nagarajan, R., Le, N., Mahoney, H., Araki, T., and Milbrandt, J. 2002. Deciphering peripheral nerve myelination by using Schwann cell expression profiling. Proc. Natl. Acad. Sci. 99: 89989003. Odom, D.T., Zizlsperger, N., Gordon, D.B., Bell, G.W., Rinaldi, N.J., Murray, H.L., Volkert, T.L., Schreiber, J., Rolfe, P.A., Gifford, D.K., et al. 2004. Control of pancreas and liver gene expression by HNF transcription factors. Science 303: 13781381. Qiu, P., Qin, L., Sorrentino, R.P., Greene, J.R., Wang, L., and Partridge, N.C. 2003. Comparative promoter analysis and its application in analysis of PTH-regulated gene expression. J. Mol. Biol. 326: 13271336.[CrossRef][Medline] Ren, B., Cam, H., Takahashi, Y., Volkert, T., Terragni, J., Young, R.A., and Dynlacht, B.D. 2002. E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes & Dev. 16: 245256. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., and Lenhard, B. 2004. JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32: D91D94. Sharan, R., Ovcharenko, I., Ben-Hur, A., and Karp, R.M. 2003. CREME: A framework for identifying cis-regulatory modules in humanmouse conserved segments. Bioinformatics 19: i283i291.[Abstract] Staden, R. 1989. Methods for discovering novel motifs in nucleic acid sequences. Comput. Appl. Biosci. 5: 293298. Stormo, G.D. 1998. Information content and free energy in DNAprotein interactions. J. Theor. Biol. 195: 135137.[CrossRef][Medline] Stormo, G.D. and Fields, D.S. 1998. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci. 23: 109113.[CrossRef][Medline] Stormo, G.D., Schneider, T.D., Gold, L., and Ehrenfeucht, A. 1982. Use of the `Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10: 29973011. Tagle, D.A., Koop, B.F., Goodman, M., Slightom, J.L., Hess, D.L., and Jones, R.T. 1988. Embryonic Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouze, P., and Moreau, Y. 2002. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comput. Biol. 9: 447464.[CrossRef][Medline] Tomczak, K.K., Marinescu, V.D., Ramoni, M.F., Sanoudou, D., Montanaro, F., Han, M., Kunkel, L.M., Kohane, I.S., and Beggs, A.H. 2004. Expression profiling and identification of novel genes involved in myogenic differentiation. FASEB J. 18: 403405. Trinklein, N.D., Murray, J.I., Hartman, S.J., Botstein, D., and Myers, R.M. 2004. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell 15: 12541261. Visala Rao, D., Boyle, G.M., Parsons, P.G., Watson, K., and Jones, G.L. 2003. Influence of ageing, heat shock treatment and in vivo total antioxidant status on gene-expression profile and protein synthesis in human peripheral lymphocytes. Mech. Ageing Dev. 124: 5569.[CrossRef][Medline] Wang, T. and Stormo, G.D. 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19: 23692380. Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J.W., and Lawrence, C.E. 2000. Humanmouse genome comparisons to locate regulatory sites. Nat. Genet. 26: 225228.[CrossRef][Medline] Wong, L.H., Sim, H., Chatterjee-Kishore, M., Hatzinisiriou, I., Devenish, R.J., Stark, G., and Ralph, S.J. 2002. Isolation and characterization of a human STAT1 gene regulatory element. Inducibility by interferon (IFN) types I and II and role of IFN regulatory factor-1. J. Biol. Chem. 277: 1940819417. Zhou, Y.H., Zheng, J.B., Gu, X., Saunders, G.F., and Yung, W.K. 2002. Novel PAX6 binding sites in the human genome and the role of repetitive elements in the evolution of gene regulation. Genome Res. 12: 17161722.
Received June 16, 2005; accepted in revised format December 2, 2005. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||