|
|
|
|
Published online before print
November 7, 2007, 10.1101/gr.7156307 Genome Res. 17:1783-1786, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Resource A second-generation combined linkage–physical map of the human genome1 Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA; 2 Affymetrix, Inc., Santa Clara, California 95051, USA; 3 Applied Biosystems, Inc., Foster City, California 94404, USA; 4 Illumina, Inc., San Diego, California 92121, USA; 5 Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48104, USA; 6 Department of Statistics and Biostatistics, Rutgers University, Piscataway, New Jersey 08854, USA
We have completed a second-generation linkage map that incorporates sequence-based positional information. This new map, the Rutgers Map v.2, includes 28,121 polymorphic markers with physical positions corroborated by recombination-based data. Sex-averaged and sex-specific linkage map distances, along with confidence intervals, have been estimated for all map intervals. In addition, a regression-based smoothed map is provided that facilitates interpolation of positions of unmapped markers on this map. With nearly twice as many markers as our first-generation map, the Rutgers Map continues to be a unique and comprehensive resource for obtaining genetic map information for large sets of polymorphic markers.
Accurate and comprehensive linkage maps continue to be critical for linkage analyses (Daw et al. 2000 The Rutgers Map v.2 also provides three novel features that are not generally offered by other publicly available maps. First, we have estimated approximate 95% confidence intervals for the size of all 24,145 map intervals, both on the sex-averaged and sex-specific maps. This feature may be useful for assessing sensitivity of an analysis to map uncertainty and for combining the information in the Rutgers Map v.2 with map estimates derived from independent studies. In addition, we have applied local regression to create a smoothed version of the Rutgers Map that separates all markers by non-zero map distances. Overall, this alternative map should provide better estimates of map distance since nearly half of the map intervals in the Rutgers Map v.2, while physically distinct, show no evidence of recombination. Third, the smoothed map facilitates interpolation of map positions for markers that are not on our map. For example, a cM-scale map position can be easily estimated for any of the millions of SNP markers that have not been genotyped in the CEPH reference pedigrees and hence are not present on any of the CEPH-based linkage maps.
Markers and genotype data The new SNP data were cleaned prior to distribution for genotyping errors using a variety of approaches specific to each company (Kennedy et al. 2003
Map construction
Rutgers Map v.2 spans a total of 2925.8 Mb (2,925,822,157 bases), covering 96.8% of the Build 36.1 assembled genome. The physical coverage varies by chromosome. While the average percentage of physical length spanned by these maps is 94.7%, 15 of the chromosomes have >99.5% coverage. The acrocentric chromosomes (13, 14, 15, 21, 22) have considerably lower coverage, ranging from 68.6% (chromosome 22) to 83.8% (chromosome 13), due to the presence of large regions of heterochromatin that result in sequencing gaps. These maps are 7.5 Mb longer than our Rutgers Map v.1 (when compared to the B35 versions of our maps, which were updated on our website post-publication), indicating that the additional SNPs added to the mapping set provide greater coverage of most chromosomes. For example, the map coverage of chromosome 20 increased by 2.17 Mb due to the addition of 11 SNPs on the maps q-telomere end. Similarly, map coverage of chromosome 13 increased by 1.19 Mb due to the addition of 12 SNPs at the beginning of the map.
Confidence intervals for intermarker map distance
Map smoothing and interpolation of marker position
This second-generation Rutgers combined linkage–physical map (Rutgers Map v.2) has almost double the number of markers as the previous version and provides a unique and valuable map for several types of genetic analysis. The data have been carefully cleaned, and the position of each marker on the map is supported by both physical and recombination-based data. The smoothed maps provide a non-zero map distance between all markers and facilitate interpolation of additional markers not already on the map.
We used CRIMAP (Lander and Green 1987
The confidence intervals provided with this map could be used in two ways: (1) to quantify the effect of map uncertainty on a genetic analysis; and (2) to combine the information in the Rutgers Map v.2 with independent map estimates obtained from individual studies. First, in critical regions it may be helpful to repeat any genetic analysis using a small number of different maps, where the maps are selected so that their variance is representative of the sampling error of the map estimate. This is important since, despite the fact that many investigators ignore the effect of map uncertainty, several studies have shown the potential of incorrect map distances to negatively impact multipoint linkage analysis (Halpern and Whittemore 1999 This map contains virtually all of the polymorphic markers that have been genotyped in the CEPH standard reference pedigrees, and to our knowledge it is the most dense linkage map published to date. Other polymorphisms that investigators may be using can be localized onto our map using interpolation. Alternatively, as described above, meta-analysis could be used to combine our map with localized maps produced using genotype data from disease studies. Files providing limited details about each marker (e.g., marker heterozygosity, number of informative meioses, Build 36 physical position) along with map positions (sex-averaged, female, male, smoothed) and confidence intervals are available on the Rutgers Map website at http://compgen.rutgers.edu/maps.
Markers and genotype data Our working data set for this map contained 28,425 markers. Of these, 14,759 (51.9%) were on our Rutgers Map v.1 (Kong et al. 2004
In total, 899 SNPs were genotyped by two or more companies. For each of the redundant SNPs, we retained only the genotypes that were assayed in the largest sample, leaving 13,666 nonredundant SNPs. The genotypes at the 13,666 nonredundant SNPs were analyzed for genotyping errors, identified using the PedCheck (OConnell and Weeks 1998
Map construction
Confidence intervals for intermarker map distances
To understand the adjusted Wald method and how it applies to map estimation, consider a single map interval and a set of n independent, fully informative meioses. If nonrecombinant and recombinant intervals are labeled as zero and one, respectively, then the standard estimate of the recombination rate p is the sample mean In principle, our confidence intervals could be used in conjunction with Rutgers Map v.2 to posit realistic multivariate distributions for linkage maps, which would make it easy for investigators to quantify the effect of sampling error on their analyses. This is important since almost all multipoint genetic analyses contain an added, but often ignored, layer of variability that is attributable to uncertainty in the map.
Map smoothing and interpolation of marker position
Map distances were smoothed with local regression using a quadratic fit, as implemented in the Locfit package in R (Loader 1999 Map positions on the centimorgan scale can be interpolated for markers not present on our map. Given a markers physical position, linear interpolation from the dense grid is used to identify a corresponding cM map position.
We thank Dr. Linda Brzustowicz for helpful discussions. This work was partially supported by National Institutes of Health grants GM080221, HG003229, and MH068457 (T.C.M.), HG00040, HG002651 (W.C.L.S.), and AA015346 (S.G.B.) and by March of Dimes grant 12-FY02-108 (T.C.M.).
7 Present addresses: Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA;
8 Glaxo Smithkline, Research Triangle Park, NC 27709, USA;
9 Scripps Genomic Medicine, La Jolla, CA 92037, USA.
E-mail matise{at}biology.rutgers.edu; fax (732) 445-4972. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.7156307
Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. 2002. Merlin—Rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30: 97–101.[CrossRef][Medline] Agresti, A. and Coull, B.A. 1998. Approximate is better than "exact" for interval estimation of binomial proportions. Am. Statist. 52: 119–126.[CrossRef] Akaike, H. 1974. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19: 716–723.[CrossRef] Barber, M.J., Todd, J.A., and Cordell, H.J. 2006. A multimarker regression-based test of linkage for affected sib-pairs at two linked loci. Genet. Epidemiol. 30: 191–208.[CrossRef][Medline] Dausset, J., Cann, H., Cohen, D., Lathrop, M., Lalouel, J.M., and White, R. 1990. Centre dEtude du Polymorphisme Humain (CEPH): Collaborative genetic mapping of the human genome. Genomics 6: 575–577.[CrossRef][Medline] Daw, E.W., Thompson, E.A., and Wijsman, E.M. 2000. Bias in multipoint linkage analysis arising from map misspecification. Genet. Epidemiol. 19: 366–380.[CrossRef][Medline] Dette, H., Neumeyer, N., and Pilz, K.F. 2006. A simple nonparametric estimator of a strictly monotone regression function. Bernoulli 12: 469–490. Dietter, J., Mattheisen, M., Furst, R., Ruschendorf, F., Wienker, T.F., and Strauch, K. 2007. Linkage analysis using sex-specific recombination fractions with GENEHUNTER-MODSCORE. Bioinformatics 23: 64–70. Efron, B. and Tibshirani, R.J. 1993. An introduction to the bootstrap. Chapman & Hall, New York. Fingerlin, T.E., Abecasis, G.R., and Boehnke, M. 2006. Using sex-averaged genetic maps in multipoint linkage analysis when identity-by-descent status is incompletely known. Genet. Epidemiol. 30: 384–396.[CrossRef][Medline] Halpern, J. and Whittemore, A.S. 1999. Multipoint linkage analysis. A cautionary note. Hum. Hered. 49: 194–196.[CrossRef][Medline] Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., et al. 2003. Large-scale genotyping of complex DNA. Nat. Biotechnol. 21: 1233–1237.[CrossRef][Medline] Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31: 241–247.[CrossRef][Medline] Kong, X., Murphy, K., Raj, T., He, C., White, P.S., and Matise, T.C. 2004. A combined linkage–physical map of the human genome. Am. J. Hum. Genet. 75: 1143–1148.[CrossRef][Medline] Lander, E.S. and Green, P. 1987. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. 84: 2363–2367. Loader, C. 1999. Local regression and likelihood. Springer-Verlag, New York. Maniatis, N., Collins, A., Xu, C.F., McCarthy, L.C., Hewett, D.R., Tapper, W., Ennis, S., Ke, X., and Morton, N.E. 2002. The first linkage disequilibrium (LD) maps: Delineation of hot and cold blocks by diplotype analysis. Proc. Natl. Acad. Sci. 99: 2228–2233. Murray, S.S., Oliphant, A., Shen, R., McBride, C., Steeke, R.J., Shannon, S.G., Rubano, T., Kermani, B.G., Fan, J.B., Chee, M.S., et al. 2004. A highly informative SNP linkage panel for human genetic studies. Nat. Methods 1: 113–117.[CrossRef][Medline] OConnell, J.R. and Weeks, D.E. 1998. PedCheck: A program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet. 63: 259–266.[CrossRef][Medline] Stewart, W.C. 2007. Improving estimates of genetic maps: A meta-analysis-based approach. Genet. Epidemiol. 31: 408–416.[CrossRef][Medline] Stewart, W.C. and Thompson, E.A. 2006. Improving estimates of genetic maps: A maximum likelihood approach. Biometrics 62: 728–734.[CrossRef][Medline] Tapper, W., Collins, A., Gibson, J., Maniatis, N., Ennis, S., and Morton, N.E. 2005. A map of the human genome in linkage disequilibrium units. Proc. Natl. Acad. Sci. 102: 11835–11839.
Received September 28, 2007; accepted in revised format October 11, 2007. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||