The LN54 Radiation Hybrid Map of Zebrafish Expressed Sequences

  1. Neil Hukriede1,7,
  2. Dan Fisher2,7,
  3. Jonathan Epstein1,7,
  4. Lucille Joly3,
  5. Patricia Tellis4,
  6. Yi Zhou5,
  7. Brad Barbazuk2,
  8. Kristine Cox2,
  9. Laura Fenton-Noriega5,
  10. Candace Hersey5,
  11. Jennifer Miles3,
  12. Xiaoming Sheng5,
  13. Anhua Song5,
  14. Rick Waterman2,
  15. Stephen L. Johnson2,
  16. Igor B. Dawid1,
  17. Mario Chevrette4,
  18. Leonard I. Zon5,
  19. John McPherson2, and
  20. Marc Ekker3,6,8
  1. 1Laboratory of Molecular Genetics and Unit of Biological Computation, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, USA; 2Department of Genetics, Washington University Medical School, St. Louis, Missouri 63110, USA; 3Ottawa Hospital Research Institute, the Ottawa Hospital, Ottawa K1Y 4E9, Canada; 4Montreal General Hospital Research Institute and Department of Surgery, McGill University, Montreal H3G 1A4, Canada; 5Howard Hughes Medical Institute and Department of Hematology, Childrens' Hospital, Boston, Massachusetts 02115, USA; and 6Departments of Medicine and of Cellular and Molecular Medicine, University of Ottawa, Ottawa K1Y 4E9, Canada

Abstract

To increase the density of a gene map of the zebrafish, Danio rerio, we have placed 3119 expressed sequence tags (ESTs) and cDNA sequences on the LN54 radiation hybrid (RH) panel. The ESTs and genes mapped here join 748 SSLp markers and 459 previously mapped genes and ESTs, bringing the total number of markers on the LN54 RH panel to 4226. Addition of these new markers brings the total LN54 map size to 14,372 cR, with 118 kb/cR. The distribution of ESTs according to linkage groups shows relatively little variation (minimum, 73; maximum, 201). This observation, combined with a relatively uniform size for zebrafish chromosomes, as previously indicated by karyotyping, indicates that there are no especially gene-rich or gene-poor chromosomes in this species. We developed an algorithm to provide a semiautomatic method for the selection of additional framework markers for the LN54 map. This algorithm increased the total number of framework markers to 1150 and permitted the mapping of a high percentage of sequences that could not be placed on a previous version of the LN54 map. The increased concentration of expressed sequences on the LN54 map of the zebrafish genome will facilitate the molecular characterization of mutations in this species.

The zebrafish (Danio rerio) has emerged as an excellent model organism to study vertebrate biology and human diseases, largely because of the availability of a large number of mutations affecting a wide range of developmental pathways and physiological systems (Driever et al. 1996; Haffter et al. 1996; Dooley and Zon 2000). Many of the mutant phenotypes in zebrafish resemble human clinical disorders. The candidate gene approach and chromosomal walks have allowed the molecular characterization of a few dozen mutations in the zebrafish thanks, in part, to the availability of meitoic and radiation hybrid (RH) maps for this species (Knapik et al. 1998; Postlethwait et al. 1998; Gates et al. 1999; Geisler et al. 1999;Hukriede et al. 1999; Woods et al. 2000). For example, comparative genomics and RH mapping allowed the identification of β-spectrin as the gene affected in riesling mutants that suffer from hemolytic anemia (Liao et al. 2000).

Efforts to systematically identify genes in the zebrafish before the complete sequencing of its genome are under way. An expressed sequence tags (EST) project (http://www.genetics.wustl.edu/fish_lab/frank/cgi-bin/fish/) is being performed using fingerprint-normalized cDNA libraries (Clark et al. 2001). The fingerprinted libraries were made from mRNA either from embryos of various stages of development or from adult tissues such as brain, liver, kidney, retina, olfactory epithelium, fin, and fin regenerates.

In the past few years, RH mapping has been established as the most efficient method for generating large-scale genomic maps of both mammalian and nonmammalian species. The two zebrafish RH panels, LN54 and T51 (Geisler et al. 1999; Hukriede et al. 1999), constitute valuable resources to rapidly increase density of the zebrafish gene map through the placing of zebrafish ESTs. In an effort to provide a powerful and reliable tool for the molecular identification of zebrafish mutations by use of a candidate gene approach and to facilitate establishment of gene orthology relationships through conserved synteny analysis, we report an EST map of the zebrafish genome with ∼3600 ESTs, all but 459 previously uncharacterized. This brings the total number of markers on LN54 RH panel to 4226. We have also increased the number of framework markers on the LN54 map from 703 to 1150. The increase in the density of framework markers was shown to further increase the efficiency of placing markers on LN54.

RESULTS

Placement of ESTs on the LN54 RH Map

We have obtained RH mapping data for 3119 previously unmapped genes and EST sequences, bringing the total of genes and ESTs mapped on the LN54 RH panel to 3578. With a haploid genome size of 1700 Mbp, this represents 2.1 ESTs per Mb. For the mapping of ESTs, oligonucleotide primer sets for RH mapping were chosen using 3′ clusters. Information on the origin of the cDNA clones mapped in this study can be obtained by accessing the Washington University EST database (http://www.genetics.wustl.edu/fish_lab/frank/cgi-bin/fish/) or the Web site of Igor Dawid's laboratory's in situ screen (http://zf.nichd.nih.gov/pubzf; Kudoh et al. 2001). A map of two of the LGs (linkage groups) of the LN54 RH map is shown in Figure 1. The remaining LGs can be seen athttp://dir.nichd.nih.gov/nichdlmg/lmgdevb.htm. The distribution of mapped genes ESTs per linkage group is shown in Table 1. The number of ESTs per linkage group varied between 73 for LG9 and 201 for LG2.

Figure 1.

Two linkage groups of the LN54 radiation hybrid (RH) map. Framework markers are shown in black; placement markers, in blue.

Table 1.

Distribution of Mapped ESTs/Genes per Linkage Group

After integration of the above EST retention data to the overall LN54 set of RH vectors, the overall retention of LN54 is 21.7%, with a discordance of 1.4%. Thus, retention remained identical to that of the original LN54 map (22% for 1055 markers; Hukriede et al. 1999). Retention values per linkage group are given in Table 2.

Table 2.

Quantitative Characteristics of the LN54 Radiation Hybrid Maps of the 25 Zebrafish Linkage Groups

We compared the efficiency and reproducibility of mapping EST markers when these were assayed in only one polymerase chain reaction (PCR) assay or in duplicate PCR assays. We sampled ∼300 ESTs at random and found that for 92% of them, analysis of a single RH vector placed the EST to the same linkage group as the consensus RH vector obtained from duplicate PCR assays. Furthermore, 87% of the above markers placed to the same position. Most of those ESTs for which a single RH vector did not give the same LG placement as the consensus RH vector were cases in which one of the RH vectors produced multiple linkages. Only one EST in our sample produced placement to two distinct LGs when the individual RH vectors were compared. Duplicate mapping markedly increases the cost and diminishes the throughput of RH mapping. We found that duplicate mapping can help resolve uncertainties for those markers for which the initial RH vector yielded ambiguous results. However, mapping of single RH vectors can, in most instances, be sufficient. Similar observations were made during the building of a high-density EST map of the rat genome by RH mapping (Scheetz et al. 2001).

Selection of Additional Framework Markers for the LN54 RH Map

One of the challenges of growing RH maps is the need to carefully convert some placement markers into framework markers, thereby improving map resolution while maintaining its quality. Framework markers are important as they are used to establish placement of newly tested genes and sequences. Thus, the higher the density of framework markers, the more likely it will be possible to place new genes and markers on the RH map. Furthermore, a dense framework will sometimes enable the placement of genes or markers that are slightly mis-scored, for example, those caused by false negatives. Because of the volume of marker data, we have developed a software that usesRHMAPPER in a subsidiary fashion and provides a semiautomatic method of selecting candidate markers for this conversion (Hudson et al. 1995). The algorithm considers each pair of adjacent framework markers as A and B. A placement marker C is a candidate for conversion to a framework marker between A and B if it is closer to both A and B than A and B are to each other. More specifically, lod (A, C) > lod (A, B) + Δ and lod (B, C) > lod (A, B) + Δ, where in practice we choose Δ = 1, and typical good lod scores are ≥8. For each round of automated framework analysis, the software recommends against converting more than one candidate marker between an existing pair of framework markers. The software also recommends that confidential markers be excluded from the framework. Figure 2 shows a sample of the machine-generated recommendations, which are implemented automatically after user consultation.

Figure 2.

Sample output of the program for selection of framework markers. Only those candidate markers that satisfy the selection criteria (see text) are shown in the output. When more than one candidate in a given interval between two known framework markers is acceptable (e.g., LG1), the one with the highest sum of lod scores for adjacent framework markers is chosen (REC). Markers that have been submitted to the LN54 Web site in confidence (e.g., LG8) are temporarily rejected, even if they satisfy selection criteria. The program was re-run with the new frameworks in place until all possible candidates have been tested.

The number of framework markers at the time of publication of the original LN54 RH map was 684, most of them simple sequence-length polymorphic markers (Hukriede et al. 1999). We generated an additional 466 framework markers, using the software described above, bringing the total number of framework markers to 1150. The repartition of framework markers per linkage group is shown in Table 2 and varies from 34 for LG20 to 70 for LG6.

To determine how the increase in the number of framework markers improved our ability to map genes and ESTs with LN54, we tested a set of markers that had been previously submitted to the LN54 mapping Web site but had failed to map with the previous framework. We limited our analysis to markers with retention between 10% and 60%. Of 113 such markers, we find that 56 of them (49.6%) can now be placed on the map. Thus, the increase in framework markers improved the ability to map with LN54, at least for those sequences that do not show exceptionally high or low retention.

The total length of the LN54 map, following addition of the new framework markers, is 14,372 cR, for a 118 kb/cR correspondence. Thus, the addition of ∼3000 additional markers increased the total map length by 25%. There were wide variations in the percentage increase in the length of each linkage group. The length of some, like LG7 and LG20, did not change much, whereas the length of others, like LG5 and LG6, increased by ∼50% (Table 2). This variation in the length of the LGs was not related to the abundance or paucity of markers in the previous version of the map and correlated only weakly with the percentage increase in new framework markers (r = 0.62, excluding data for LG20).

To compare the RH map presented here to existing meiotic maps for the zebrafish, we calculated the cR/cM values using data from the meiotic map reported by Woods and collaborators (2000). We obtained a value of 4.43 cR/cM averaged across all linkage groups. Values ranged from 2.43 for LG7 to 7.33 for LG10, with most values (18 of 25) between 3.4 and 5.4 (Table 2).

DISCUSSION

Because a large number of ESTs from diverse sources were placed on the 25 linkage groups of the LN54 map (Table 1), we get a first indication of the gene density of each of the 25 chromosomes in zebrafish. The range of values in Table 1 (73 to 201) is small compared with the ratio observed in the human chromosome, for which there are up to eightfold differences (excluding chromosome Y) between chromosomes (Venter et al 2001). The zebrafish karyotype indicates a relatively uniform size for most of the zebrafish chromosomes (Piknacker and Ferwerda 1995; Daga et al. 1996; Gornung et al. 1997; Amores and Postlethwiat 1998). Although the 25 linkage groups of the RH and genetic maps of the zebrafish genome have yet to be assigned to specific chromosomes, the gene distribution presented here seems to indicate the zebrafish does not have chromosomes that are especially poor in genes.

Mapping of ESTs and genes has also allowed us to increase the density of framework markers on the LN54 RH map. This had a positive impact on our ability to map zebrafish genes, ESTs, and other markers with the LN54 panel. We have tried to determine whether or not we were approaching saturation in the number of framework markers. Plotting the number of framework markers selected with the algorithm presented here as a function of the total markers placed on LN54 indicates that we have not yet reached saturation in framework markers, although the curve is starting to plateau (data not shown). Therefore, further mapping of ESTs and other zebrafish sequences with the LN54 panel should lead to additional increases in the density of the framework map, thus increasing the ability to place markers on the map and increasing the confidence of marker order.

To be placed on the LN54 RH map, ESTs, cloned genes, and other markers had to map with a lod score >5 with respect to one of the framework markers, and a lod difference >3 had to be obtained between the best and the second best placement if the latter is found on a different linkage group. Using these criteria, 46 of a set of 50 cDNAs, for which the mapping was attempted recently on LN54, mapped successfully (N. Hukriede and M. Tsang, unpubl.). This 92% success rate is slightly higher than our initial value of 88%, which we achieved when the initial framework map was built (Hukriede et al. 1999). Thus, the increased density of the framework map may have resulted in a modest increase in the ability to map with the LN54 panel. When performing high-throughput mapping of ESTs, 84% of primer pairs showing retention on the LN54 panel were mapped successfully. This somewhat lower rate can be attributed to the fact that EST mapping was attempted a maximum of two times in this study, whereas the successful mapping of a specific gene or marker may require a larger number of PCR assays and adjustment of the PCR conditions.

The RH mapping of 3119 new ESTs on the LN54 panel complements the recently reported gene map based on one of the meiotic mapping panels and comprising 1503 genes and ESTs (Woods et al. 2000). Correspondence between the linkage groups of the RH and meiotic maps of the zebrafish genome has been performed (seehttp://zfin.org/cgi-bin/ZFIN_jump?record=JUMPTOREFCROSS). The previous version of the LN54 showed a high percentage of concordance with meiotic maps (Hukriede et al. 1999). A detailed comparison of the new EST mapping data with other gene maps of the zebrafish could not be performed because few of the ESTs mapped in the current study were mapped on the meiotic panels (Gates et al. 1999; Woods et al. 2000).

Comparative genomics using zebrafish and human gene maps has shown the existence of several blocks of synteny that existed in the common ancestor of these two species, ∼450 million years ago (Postlethwait et al. 1998, 2000; Gates et al. 1999; Barbazuk et al. 2000). Comparisons of gene sequence, function, and regulation have revealed a high degree of conservation across vertebrate species. The RH map presented here can be useful in the establishment of gene orthology relationships based on these conserved syntenies. For example, conserved synteny between a region of zebrafish LG12 and human chromosome 10p11.2 supported the presumed orthology between the recently identified zebrafish nma, which encodes a protein involved in attenuation of BMP signaling during development, and humanNMA (Tsang et al. 2000).

The expansion of gene families in the genome of the zebrafish and other euteleosts compared with that of mammals (Robinson-Rechavi et al. 2001a) has led investigators to suggest the occurrence of a whole genome duplication event shortly after the divergence of teleost from lobe-finned fish (Amores et al. 1998), although the model of evolution by genome duplication versus local duplications is still debated (Hughes et al. 2001; Robinson-Rechavi et al. 2001b). As the number of genes mapped in zebrafish increases and the analysis of conserved syntenies with other vertebrate genomes is expanded, one should be able to bring support to one of the two models, although the presence of multiple local duplication events in addition to a whole genome duplication will be difficult to exclude.

Plans have been made to sequence the zebrafish genome and should be completed within the coming years. Meanwhile dense maps of the zebrafish, such as the one presented here, will be instrumental in the cloning of mutant loci by chromosomal walking or by the candidate gene approach. The availability of the RH mapping tools is particularly valuable for the analysis of data sets obtained from gene expression screens such as that of Kudoh and collaborators (2001). In this in situ hybridization-based screen, cDNAs are selected for analysis according to their embryonic expression pattern. The ability to routinely obtain map positions for many of these cDNAs adds greatly to their potential utility as markers and as candidate genes for mutations and thus constitutes an important application of the LN54 RH mapping panel.

METHODS

Selection of ESTs, Primer Design, and RH Mapping

ESTs from several zebrafish libraries (see Table 1 legend) were selected for RH mapping. EST 3′ clusters were mainly used for primer selection. In some cases, we attempted to map singleton 5′ ESTs with no corresponding 3′ EST when these were less likely to be redundant with 3′ clusters. Primer pairs were selected with the OSPprogram (Hillier and Green 1991). Parameters were set so primer length was between 19 and 22 nucleotides, primer-annealing temperatures were between 55°C and 65°C, and PCR products had a predicted size between 120 and 400 bp. In the event OSP failed to indicate a primer set for a particular 3′ cluster, we ran the sequence of a second EST from the same cluster through the program. WhenOSP designed too many pairs for a given cluster, we increased the requested annealing temperature to between 60°C and 65°C or asked for a narrower range of PCR product sizes. Using these approaches, primers could be designed for nearly every 3′ cluster. Primer pairs were used in PCR reactions as previously described (Hukriede et al. 1999). A primer set had to generate a clear amplicon from zebrafish genomic DNA but no band with genomic DNA from the mouse, the recipient species used to make the LN54 RH panel (Hukriede et al. 1999).

RH mapping was performed by PCR as previously described (Hukriede et al. 1999), and data were analyzed with RHMAPPER using the same parameters and modifications as previously (Hukriede et al. 1999).

Placement Map Construction and Selection of New Framework Markers

For placement map construction, RH vectors for ESTs are first compared with the whole genome map. If a lod placement value with ≥5 is found and a lod difference of ≥3 is obtained between the best placement and the second best placement on a different LG, the EST is placed on the LG with the best placement. Then, if the distance between the EST and the nearest framework marker is within 50 cR, the EST is added to the placement map. We find distances >50 cR allow the marker to cause an inappropriate expansion of the map and affect the correct order of markers on an LG. For each placement marker, the position on the map is based on the most likely interval compared with less likely intervals. The most likely interval is determined by whether the proximal or distal framework marker has a higher lod score with respect to the EST. To ensure that location of placement markers is as accurate as possible, we randomly performed multipoint map evaluations to confirm the positions for the placement markers. Furthermore, we compared positions of ESTs on the LN54 and the T51 RH maps for markers assessed on both RH panels.

A computer program was written to allow for selection of additional framework markers for the LN54 RH map. This program was briefly described in Results, and a more detailed description of how the data are being manipulated can be found on two flowcharts posted athttp://dir.nichd.nih.gov/lmg/lmgdevb.htm. These flowcharts not only provide details of the novel approach but also show how placement maps are created, which is the step before the novel approach. The way the program is designed, a two-point analysis is used to assign a marker to a proper LG and a subsequent multipoint analysis is used to (1) assign the position on the LG, (2) evaluate the order of the marker with adjacent markers, and (3) assign the best marker in the interval between two existing framework markers to become a new framework marker.

Acknowledgments

We thank Thomas J. Hudson, the staff of the Montreal Genome Center, K. Vaillancourt, and D. Brez for technical assistance. This work was supported by grant GOP-12781 from the Canadian Institutes of Health Research (M.E. and M.C.) and by National Institutes of Health grants RO1 DK55379 (S.L.J.) and RO1 DK55381 (L.Z.). M.E. is an investigator of the Canadian Institutes of Health Research, and M.C is a “chercheur-boursier” of the Fonds de Recherches en Santé du Québec.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

  • 7 These authors contributed equally to this work.

  • 8 Corresponding author.

  • E-MAIL mekker{at}ohri.ca; FAX (613) 761-5036.

  • Article published on-line before print: Genome Res.,10.1101/gr.210601.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.210601.

    • Received August 15, 2001.
    • Accepted September 20, 2001.

REFERENCES

Articles citing this article

| Table of Contents

Preprint Server