Characterization of Clustered MHC-Linked Olfactory Receptor Genes in Human and Mouse

  1. Ruth M. Younger1,
  2. Claire Amadou2,5,
  3. Graeme Bethel1,
  4. Anke Ehlers3,
  5. Kirsten Fischer Lindahl2,
  6. Simon Forbes4,
  7. Roger Horton1,
  8. Sarah Milne1,
  9. Andrew J. Mungall1,
  10. John Trowsdale4,
  11. Armin Volz3,
  12. Andreas Ziegler3, and
  13. Stephan Beck1,6
  1. 1The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK; 2Howard Hughes Medical Institute, Center for Immunology, University of Texas Southwestern Medical Center, Dallas, Texas 75390–9050, USA; 3Institut für Immungenetik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, 14050 Berlin, Germany; 4Cambridge University, Department of Pathology, Immunology Division, Cambridge CB2 1QP, UK

Abstract

Olfactory receptor (OR) loci frequently cluster and are present on most human chromosomes. They are members of the seven transmembrane receptor (7-TM) superfamily and, as such, are part of one of the largest mammalian multigene families, with an estimated copy number of up to 1000 ORs per haploid genome. As their name implies, ORs are known to be involved in the perception of odors and possibly also in other, nonolfaction-related, functions. Here, we report the characterization of ORs that are part of the MHC-linked OR clusters in human and mouse (partial sequence only). These clusters are of particular interest because of their possible involvement in olfaction-driven mate selection. In total, we describe 50 novel OR loci (36 human, 14 murine), making the human MHC-linked cluster the largest sequenced OR cluster in any organism so far. Comparative and phylogenetic analyses confirm the cluster to be MHC-linked but divergent in both species and allow the identification of at least one ortholog that will be useful for future regulatory and functional studies. Quantitative feature analysis shows clear evidence of duplications of blocks of OR genes and reveals the entire cluster to have a genomic environment that is very different from its neighboring regions. Based on in silico transcript analysis, we also present evidence of extensive long-distance splicing in the 5′-untranslated regions and, for the first time, of alternative splicing within the single coding exon of ORs. Taken together with our previous finding that ORs are also polymorphic, the presented data indicate that the expression, function, and evolution of these interesting genes might be more complex than previously thought.

[The sequence data described in this paper have been submitted to the EMBL nucleotide data library under accession nos.Z84475, Z98744, Z98745, AL021807, AL021808, AL022723, AL022727,AL031893, AL035402, AL035542, AL050328, AL050339, AL078630, AL096770,AL121944, AL133160, and AL133267.]

Olfactory receptor genes (ORs) were first identified in rat olfactory epithelium as small, intronless genes with easily identifiable consensus motifs in conserved domains of the predicted seven transmembrane (7-TM) structure (Buck and Axel 1991). This work stimulated much interest in understanding the molecular basis of olfaction, leading to a large number of ORs being identified. ORs are best known for their involvement in the perception of odors, which is accomplished through OR expression in two anatomically and functionally different organs within the nose: the main olfactory epithelium (MOE) and the vomeronasal organ (VNO). In general, ORs expressed in the MOE are believed to recognize environmental odors (conscious odor perception), whereas ORs expressed in the VNO are believed to recognize odors such as pheromones (subconscious odor perception). However, two recent studies suggest that most of the VNO-type 1 ORs (V1Rs) are nonfunctional pseudogenes in humans (Giorgi et al. 2000; Rodriguez et al. 2000). Nonolfaction-associated OR function such as cell-cell recognition in embryogenesis has also been suggested (Dreyer 1998). For recent reviews on the molecular and cellular biology of ORs, see the special Science issue of October 27, 1999, on olfaction (Science vol. 286; Mombaerts 1999a,b). Public databases currently hold over 600 OR and OR-like genes and pseudogenes, from invertebrates such as Caenorhabditis elegans and Drosophila melanogaster to complex vertebrates, including more than 200 from Homo sapiens. However, the analysis of these ORs has somehow been hampered because only partial sequences are available for most of them (at least for the mammalian ORs). This shortcoming is largely the result of the quick but imperfect approach to identify new ORs by degenerate PCR, and it will soon be corrected as more genomic sequences become available.

A recent genome-wide survey revealed that MOE-type ORs are present on most human chromosomes (Rouquier et al. 1998). The fact that ORs occur in clusters rather than being randomly distributed was recognized early on (Ben-Arie et al. 1994), and a combination of repeated single gene and block duplications was proposed as the underlying mechanism (Lancet and Ben-Arie 1993; Glusman et al. 1996, 2000; Sullivan et al. 1996;Trask et al. 1998a,b). The existence of major histocompatibility complex (MHC)-linked ORs on human chromosome 6 was first discovered in 1995 (Fan et al. 1995). Using a cDNA selection approach, several cDNAs were identified (including FAT11) and mapped telomeric of the MHC. Together with the recently published sequence of the classical MHC (The MHC Sequencing Consortium 1999), the OR cluster reported here forms over 4.5 Mb of contiguous genomic sequence. The region including the OR cluster has previously been shown to be in strong linkage disequilibrium with the MHC (although possibly not in all haplotypes) and has been proposed to be part of the extended MHC (Malfroy et al. 1997; Stephens et al. 1999). This raises the possibility that these ORs are not only physically linked to the MHC but also may have some functional association (e.g., mate selection) with genes of the complex (Ehlers et al. 2000; Ziegler et al. 2000a). Therefore, we have sequenced the human MHC-linked OR cluster and discuss here our findings in comparison with our preliminary data from the orthologous murine OR cluster.

RESULTS AND DISCUSSION

Gene Organization and Genomic Environment

Taking the previously established region of conserved synteny between human and mouse (Yoshino et al. 1997) into account, we have divided the human MHC-linked OR cluster into two subclusters: the MHC-linked major OR cluster (562 kb between positions 105 and 667 kb in Fig. 1) and the MHC-linked minor OR cluster (between HFE and RFP). As illustrated in Figure 1, the ∼3-Mb-long region between HFE and RFP is not yet completely finished (∼90% finished, four out of 30 clones still unfinished), but all of the OR loci have been finished and are included in the analysis presented here. Throughout this study, we follow the OR naming convention previously proposed by us (Ehlers et al. 2000;Ziegler et al. 2000a).

Figure 1.

Genomic organization of the MHC-linked olfactory receptor (OR) genes on chromosome 6. In accordance with the agreed sequence orientation for the human genome, the orientation shown here is from telomere (left) to centromere (right). Except for the top segment, each sequence segment consists of a scale bar, a bar for CpG islands, a bar for short interspersed repeats, a bar for long interspersed repeats, a bar for Alu repeats, a bar for exons, and a bar for the sequenced clone tile path. Classification of all repeats is according to the RepeatMasker program (see Methods). Transcriptional orientations are shown by arrows under the gene names and EST-confirmed splicing in the 5′-UTRs of hs6M1-16 andhs6M1-21 is indicated by interconnecting the corresponding exons. Gene positions in the 3-Mb top segment are approximate. The duplication of a 35-kb segment containing two olfactory receptor genes is boxed grey.

Figure 1 summarizes the genomic organization of the region betweenRFP and HLA-F that, as defined above, contains the major OR cluster and turns out to be just over 800 kb in size. It reveals the cluster to consist of 25 MOE-type OR loci, of which 12 (hs6M1 -1, -3, -6, -12, -15, -16, -17, -18, -20, -21, -27, -28) have complete open reading frames and, therefore, are predicted to be functional. The remaining 13 loci (hs6M1 -2P, -4P, -5P, -7P, -8P, -13P, -14P, -19P, -22P, -23P, -24P, -25P, -26P) are predicted to be pseudogenes (P) on the basis of disabling stop codons, rearrangements, and/or frameshift mutations. The ratio of genes versus pseudogenes is likely to be different in different individuals owing to OR polymorphism. For instance, we showed previously that the stop codon rendering hs6M1-4P, a pseudogene here, is not present in six out of 10 cell lines tested for OR gene polymorphism (Ehlers et al. 2000; Ziegler et al. 2000a). A similar scenario of gene versus pseudogene status was also established for hs6M1-17,hs6M1-19P, and hs6M1-29P (Ehlers et al. 2000; data not shown). Taken together, these data indicate that the gene versus pseudogene ratio for the MHC-linked major OR cluster is closer to 1 rather than the previously reported average of about 0.3 (Rouquier et al. 1998). In addition, the region also contains a number of other genes, including MOG (Pham-Dinh et al. 1993), GABBR1(Kaupmann et al. 1997), FAT10 (Liu et al. 1999),ZNF57, the human counterpart of Zfp57 (Okazaki et al. 1994), a novel Mas-like G-protein-coupled receptor (MAS1L), a novel zinc finger protein (ZNF311), and 11 pseudogenes. Interestingly, GABBR1 and MAS1L are also members of the 7-TM superfamily. Dot-matrix analysis of the entire 807-kb region reveals one major duplication event of about 35 kb involving ORs hs6M1-4P/hs6M1-3 and hs6M1-5P/hs6M1-6(Fig. 1, grey blocks). Comparisons of further coding regions show that more duplications and/or gene conversions are likely to have occurred—examples are hs6M1-12, -13P and -16, which are on average 90% identical (DNA level) to each other (Ziegler et al. 2000b). A detailed phylogenetic analysis of MHC-linked OR genes is described below.

Figure 2 shows a G + C content plot for the MHC-linked major OR cluster and its immediate flanking regions. This analysis defines the entire cluster as a low G + C (average 37.83%) isochore (L-family), including a local G + C increase around position 600 kb owing to the insertion of five non-OR loci (see also Fig. 1). CpG analysis reveals that there are no CpG islands within the cluster but several within the flanking regions. These results are consistent with the observations made on the chromosome 17 cluster. Although the cluster on chromosome 17 also resides within an L-family isochore, it contains four CpG islands but they are not coupled to any of the OR genes (Glusman et al. 2000). The 11 OR loci of the MHC-linked minor cluster (distal of RFP) are only shown at their approximate positions. Among them are two VNO-type 1 pseudogenes,VNRI1P and VNRI2P (also known as hs6V1-1Pand hs6V1-2P). Of the remaining nine MOE-type loci, three (hs6M1-10, -32, -35) are predicted to be functional and six (hs6M1-9P, -29P, -30P, -31P, -33P, -34P) are predicted to be pseudogenes (although apparently not in all individuals; see above), giving a gene-to-pseudogene ratio of 0.5 compared with about 1 for the major cluster. In all, we have identified 36 novel human OR loci, resulting in a density of 1 OR per 23 kb for the MHC-linked major cluster. In comparison, the OR cluster on chromosome 17p13.3 is of similar size (412 kb) and of similar density (1 OR per 24 kb) as the MHC-linked major cluster, but it has a higher gene-to-pseudogene ratio (1.83) and no intervening non-OR (pseudo)genes (Glusman et al. 2000). Dense clustering of functionally related genes has also been observed in other gene families and is thought to be advantageous for coordinate regulation (Gumucio et al. 1988; Zimmer et al. 1992; Wright et al. 1995). Quantitative feature analysis (data not shown) shows the major OR cluster to reside within an L-isochore with a distinct preponderance of L1 repeats, confirming the possibility of an L1-mediated duplication mechanism. For instance, the boundary sequences of the block duplication in Figure 1 (grey boxes) are all L1 repeats. A similar L1-mediated mechanism has been shown to be responsible for the duplication of the γ-globin locus (Fitch et al. 1991).

Figure 2.

G + C content plot of the MHC-linked major olfactory receptor (OR) cluster and immediate flanking regions. The mean G + C% (smoothed per 50-kb interval) is plotted per 1 kb at the midpoint of the interval starting at 25 kb. (Black boxes) OR loci, (white boxes) non-OR loci, (black triangles) positions of CpG islands. The average G + C content of the cluster is 37.83% (see also Table 2), defining it as a low G + C (L-family) isochore (Bernardi 1993).

In Silico Transcript Analysis

There is increasing direct and indirect evidence that OR expression is not limited to the MOE. OR-like sequences have been found in a number of other tissues, such as testis (Parmentier et al. 1992), colon, kidney, liver (Dreyer 1998), and heart (Drutel et al. 1995), suggesting a role for ORs outside the olfactory system. Based on in silico transcript analysis of the OR cluster described here, we can confirm and add to this evidence. Screening of publicly available expressed sequence tag (EST) databases produced hits as summarized in Table 1. The overall low hit rate is not surprising, as there are no public EST data available from MOE tissue. Only five out of the 36 MHC-linked ORs show any matches to ESTs >90% similarity. However, these matches confirm that some ORs are transcribed in non-MOE tissue such as lung, kidney, colon, prostate, testis, and germ cell tumors and, therefore, may be involved in nonolfaction-associated function.

Table 1.

List of Expressed Sequence Tags (ESTs) Matching MHC-Linked OR Genes

Alignment of these ESTs to the genomic sequence reveals unusual splicing in the 5′-UTRs of several ORs. For instance, the alignment for hs6M1-21 reveals three 5′-UTR exons and indicates that the primary transcript starts at least 74 kb upstream (position 512 kb in Fig. 1) of the hs6M1-21 ATG start codon (position 438 kb in Fig. 1). The predicted transcript spans four other OR loci, two of which are in the same (hs6M1-18, −27) and two in the opposite (hs6M1-19P, −20) transcriptional orientation. It is quite conceivable that such long transcripts may play a role in the coordinate expression of clustered ORs, for example, via alternative splicing and/or antisense regulation.

In the case of hs6M1-16 (position 542 kb in Fig. 1), the alignment with two ESTs (both from testis) also reveals 3 exons in the 5′-UTR but only up to 3 kb upstream of the predicted ATG start codon. Interestingly, both ESTs splice around the expected start codon to the third methionine (amino acid position 79) within the single coding exon of hs6M1-16, producing a predicted protein lacking the first 78 amino acids and, therefore, the first two transmembrane domains (Fig. 3A). A similar scenario is also observed for hs6M1-32 (Fig. 3B). In this case, the first half of the EST matches to a presumed noncoding sequence in PAC 193B13 (Z98744), and the second half matches to PAC 408B20 (AL133267) and splices into amino acid position 254 of hs6M1-32,. This results in a potential 5′-UTR of at least 64 kb and, using the first in-frame methionine, a predicted gene product of only 41 amino acids. To our knowledge, these two examples are the first evidence of alternative splicing within the single coding exon of any OR. Alternative splicing or alternative use of ATG start codons may also explain some of the differences observed between mouse and human ORs.hs6M1-14P, for example, is considered a pseudogene because it misses the first 78 amino acids compared to its murine ortholog,mm17M1-6 (see below). Yet, it is the only OR matching a comparatively large number of ESTs all between 99% and 100% similarity and from nonolfaction-associated tissues (Table 1). Although the position of sequence divergence coincides perfectly with the presence of an acceptor splice site, several ESTs span the position, indicating that this splice site is not used—at least not in the tissues from which the ESTs were derived (data not shown). Our interpretation of the data is that, similar to hs6M1-16, hs6M1-14P could make use of an alternative ATG start codon, most likely the one corresponding to the methionine mentioned above forhs6M1-16, resulting again in the omission of the first two transmembrane domains. In fact, the described alternative splicing or use of alternative ATG start codons may be quite common, because the methionine corresponding to amino acid position 79 in Figure 3A is conserved in 62% of the MHC-linked MOE-type ORs presented here. Of these, ten (hs6M1-2P, -7P, -8P, -9P, -15, -16, -21, -22P, -24P, -35) have apparently functional acceptor splice sites that would allow expression from this methionine as for hs6M1-16. The splicing would effectively avoid the frameshift mutations inhs6M1-7P and hs6M1-22P, making these two pseudogenes potentially expressable as proteins. In all examples discussed here, the AGGT splice consensus motif has been preserved and the corresponding splice phases are matching.

Figure 3.

Alignment of ESTs to the genomic sequences of (A)hs6M1-16, (B) hs6M1-32. AG/GT splice sites are highlighted in bold. Long intron sequences are not shown, but their sizes are indicated. The numbers on the right of the alignments refer to the conceptual amino acid positions of the unspliced protein. Positions of sequence disagreement are underlined and predicted transmembrane domains are boxed. Dashes were introduced where required to maximize the alignment. For more details, see Table 1.

Our in silico transcript analysis suggests that some ORs (including ORs currently classified as pseudogenes) may be expressed in a truncated, yet functional, form. Alternative splicing, although not over distances as long as reported here, and the expression of OR pseudogenes have been reported previously (Asai et al. 1996; Crowe et al. 1996; Walensky et al. 1998). Furthermore, the deletion of the first two transmembrane domains (as in the case of hs6M1-16) has been shown not to affect the functional expression of other members of the 7-TM G-protein-coupled receptor gene family (Ling et al. 1999). In this context, it should be noted that alternative splicing is very common. A recent EST-based study showed alternative splicing to take place in 35% of genes in the TIGR human gene index (Mironov et al. 1999). Most of the splicing events occurred within the 5′-UTRs, which was interpreted as evidence for alternative regulation mechanisms. Concerning the MHC-linked ORs, experimental evidence is now needed to determine (1) whether these splice events serve to regulate OR expression, (2) whether they correlate with nonolfaction-associated function, and (3) whether they contribute toward the generation of alternative OR gene products with novel ligand-binding properties. The in silico analysis presented here is a first step in this direction.

Human–Mouse Comparison

Conserved function correlates well with conserved synteny, which makes comparative genomic analyses so informative (Koop and Hood 1994;Baxendale et al. 1995; Ansari-Lari et al. 1998). Comparisons between human and mouse are particularly informative because the two species have diverged enough to distinguish potential coding sequences from noncoding sequences, but not too much for many regulatory sequences to be still identifiable (Hardison et al. 1997). For these and many other reasons, we are interested in analyzing the MHC-linked OR cluster in mouse alongside the human OR cluster. The MHC linkage of the mouse OR cluster on mouse chromosome 17 (also known as Tu42 andLeh89 gene clusters) was established previously (Amadou et al. 1995; Szpirer et al. 1997), and the cloning of the entire cluster is almost complete (Amadou et al. 1999). Here, we report our results from sequencing the first two clones (BACs 573K1 and 332P19) of this contig.

Figure 4A shows a comparison of human PAC 271M21 with mouse BAC 573K1 in a Percent Identity Plot (PIP) (Hardison et al. 1997). Segments of 50%–100% identity between the two sequences are plotted using the coordinates of the subject sequence, in this case the human sequence. Features in the subject sequence such as exons, repeats, and CpG islands are also plotted for orientation. The plot shows clearly that the two sequences are highly related, although the four non-OR pseudogenes (TREP, MAS1LP2, RPL13AP, and SMT3H2P) are not present in the mouse sequence. For instance, all 22 exons of the gamma-amino-butyric acid receptor B1 (GABBR1) are ∼80% identical (DNA level), whereas the introns show recognizable but partial similarity only, and part of intron 10 (position 111–113.5 kb) is not conserved at all owing to human-specific repeat expansion. The three OR loci (hs6M1-12, -13P and -14P) clearly have related genes (>75% DNA identity) in the mouse clone and the presence of multiple stacked homology bars (compared with single bars for GABBR1and FAT10) indicates additional mouse-specific OR duplications. This becomes more obvious when re-plotting the PIP using the mouse sequence as the subject sequence (data not shown). Figure 4B gives a schematic summary of both analyses. Three loci (GABBR1, FAT10, and hs6M1-14P), including one OR, are identified as true orthologs based on positional and sequence conservation. Although the remaining ORs (hs6M1-12 and -13P in human andmm17M1-1, -2, -3, -4, -5P in mouse) are clearly closely related by sequence, their exact relationship is less obvious because of species-specific duplications (see also phylogenetic analysis below). The remaining species-specific pseudogenes (MAS1LP2, RP13AP and SMT3H2P in human and the Vhl-LP gene fragment in mouse) must all have arisen by insertion or deletion after the two species diverged. Another interesting feature of the PIP analysis is the identification of conserved sequence blocks (Fig. 4A, boxed grey) upstream of all three OR loci, indicating the presence of potential regulatory elements. The conservation of such blocks is consistent with our findings, discussed above, that the 5′-UTRs of ORs can extend over considerable distances upstream of the ATG start codons and may include several splicing events. Experimental work to identify the true 5′-ends of all ORs and to test such potential regulatory elements in functional promoter assays is now in progress.

Figure 4.

(A) Percent identity plot (PIP) of the human-mouse comparison for the centromeric boundary of the MHC-linked olfactory receptor (OR) gene cluster. The two sequences used are accession no. AL031983 for the human sequence and accession no. AL078630 for the mouse sequence. The human sequence was used as the subject sequence and is annotated along the top line. Regions between 50% and 100% conservation to mouse are plotted under the corresponding human positions. The grey shaded boxes mark conserved regions possibly involved in the regulation of the corresponding OR loci. (B) Schematic summary of the human-mouse comparative analysis. OR loci are shown as black boxes and non-OR loci as white boxes. Orthologous gene loci are connected by dotted lines. 'cen' and 'tel' define directions towards centromere and telomere, respectively.

Our comparative analysis suggests at least one orthologous MHC-linked OR and established a high level of conserved synteny between the two OR clusters of human and mouse. In the two mouse clones (BACs 573K1 and 332P19) sequenced thus far a total of 14 OR loci have been identified of which at least 10 (mm17M1-1, -2, -3, -4, -6, -10, -11, -12, -13, -14) are predicted to be expressed. In addition tomm17M1-5P, which has multiple frameshift mutations, ORsmm17M1-7P, -8P, -9P are defined here as pseudogenes owing to a A > G transition at position 1, changing the initiation of translation from a methionine to a valine. The same mutation has been shown before to prevent normal initiation in other human genes (Fojo et al. 1989; Breimer et al. 1994), but it is still possible that these ORs are initiated by the second in-frame methionine at position 33 (see above). In any case, extrapolation from the above numbers indicates that the total number of expressed OR loci in the mouse cluster is higher than in humans, as has been suggested before for the entire murine contingent of OR genes (Mombaerts 1999b).

Phylogeny

To establish the relationship of the MHC-linked ORs to each other and to ORs from other clusters and species, we performed a phylogenetic analysis. Publicly available ORs were compiled into aBLAST searchable protein database. This database cross-references all original accession numbers, previous gene names, etc., and is available from us (see Methods).

Figure 5 shows a phylogenetic tree of the MHC-linked ORs reported here and representatives from other human and mouse OR clusters. Apart from some notable exceptions, most ORs group on branches corresponding to their respective chromosomal clusters, indicating that local duplication is the main mechanism of OR gene pool expansion. However, local duplications cannot account for all ORs, and there are several examples of ORs that are more closely related to ORs found in other clusters than in their own. hs6M1-17, -18, -19P, -20, -21, -27 and -28, for instance, appear to be the most diverged of the human MHC-linked ORs although hs6M1-19P, -20, -27 and -28 still cluster with MHC-linked ORs from mouse (mm17M1-7, -8, -9, -12, -13 and -14). Regarding the comparison to mouse, the tree confirms orthology betweenhs6M1-14P and mm17M1-6 (100% bootstrap confidence) and paralogy between hs6M1-12, -13 and mm17M1-1, -2, -3, -4, -5P (87% bootstrap confidence). The only mouse ORs that do not cluster with any other mouse or human MHC-linked ORs aremm17M1-10 and mm17M1-11. They were either inserted into the MHC cluster after divergence of the two species or the human counterparts were deleted.

Figure 5.

Phylogenetic tree of olfactory receptor (OR) genes from the MHC-linked clusters in human and mouse and representatives from other human clusters. The most conserved block of 98 amino acids (including TM2 and TM3) was aligned in 99 ORs, analyzed by the maximum parsimony method and confirmed by 1000 bootstrap replicates (values shown only for most recent divergences). The alignment and all the OR sequences used here are available from our ftp site (see Methods for details). For the alignment of OR pseudogenes (suffixed P), a total of nine dashes were introduced where required to correct for frameshift mutations. The human β-3 adrenergic receptor (B3AR), accession no. P13945, was used as an outgroup.

Conclusions

Apart from our demonstration that the human MHC-linked OR cluster is among the largest in the human genome and shows limited but significant homology to its counterpart in the genome of the mouse, the most intriguing aspect of this study is the EST-based finding of long distance and alternative splicing within the 5′-UTR and coding regions of some OR genes. If experimentally verified, it seems likely that this feature will be connected to regulatory control properties and diverse functions. It remains to be seen whether common control mechanisms govern the expression of OR genes in different species, and different tissues. Our study provides the foundation of such analyses for the MHC-linked OR genes.

METHODS

Mapping, Sequencing and Analysis

A sequence-ready contig of the 800-kb region between HLA-Fand RFP was generated by integration of several published contigs (kindly provided by A. Volz, J. Gruen, and D. Ruddy) with clones from the chromosome 6 mapping effort at the Sanger Centre (Lauer et al. 1997; Mungall et al. 1997; Volz et al. 1997; Ahn and Gruen 1999). The contig is part of a 7.5-Mb contig (including the extended MHC) that will be described elsewhere. The corresponding mouse contig was also described previously and was extended by fingerprint analysis of additional clones (Yoshino et al. 1998; Amadou et al. 1999).

A minimum tile path of overlapping clones was selected from both contigs, and each clone was randomly subcloned into M13mp18 and pUC18 (Bankier et al. 1987). Clone-specific details, such as library source and overlap sizes, are given in the corresponding EMBL submission headers. The DNA sequence was determined using the enzymatic dideoxy chain termination sequencing chemistry (Sanger et al. 1977) and automated ABI 373/377/3700 DNA sequencers (Applied Biosystems). The generated reads were quality clipped, screened for cloning and sequencing vectors, and assembled as previously described (The Sanger Centre 1998).

The sequences reported here have been submitted under the following clone names and accession numbers to the EMBL nucleotide databank.

Human: 25J6: Z84476; 88J8: AL035402; 80I19: AL022727; 974I11: AL050339; 150A6: AL096770; 994E9: AL035542; 145L22: AL050328; 271M21: AL031983; 377H14: AL022723; 86C11: AL021807; 24o18: AL021808; 193B12: Z98744; 408B20: AL133267; 313I6: AL121944; 29K1: Z98745.

Mouse: 573K1: AL078630; 332P19:AL133160.

Please note that, for all analyses described here, the sequence of the following accession numbers was inverted to reflect their true genomic orientation (p-telomere to centromere): Z84476, AL050339, AL096770,AL035542, AL031983, AL022723, AL078630.

The sequences were analyzed using the Sanger Centre's analysis strategy (http://www.sanger.ac.uk/HGP/Humana/). The genomic environment analysis was performed using the RepeatMasker program (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker/) to identify repeats in each sequence and parsing the output with a perl script to produce an Excel readable table of the repeat composition. ESTs were identified by BLASTN (Altschul et al. 1990) searching the human EST database at http://www.ncbi.nlm.nih.gov/ and were aligned manually. The PIP of the human and mouse sequences was generated with the advanced PIPmaker program athttp://globin.cse.psu.edu/cgi-bin/pipmaker?advanced (Hardison et al. 1997).

The phylogenetic analysis of the human and murine MHC-linked ORs was performed by two different methods (neighbor-joining and maximum parsimony) using the PHYLO_WIN package (Galtier et al. 1996). Alignments were made with CLUSTALW (Thompson et al. 1997) program and some minor manual adjustments. The final alignment is available at ftp.sanger.ac.uk/pub/rmy/Younger_et_al.pdf. Based on distance estimates derived from the Dayhoff Percent Accepted Mutations (PAM250) substitution matrix (Dayhoff et al. 1978), the maximum parsimony (Fitch 1971), and the neighbor-joining (Saitou and Nei 1987) methods were used for tree construction. Both methods produced essentially identical trees confirmed by 1000 bootstrap replicates. Trees were drawn using the TreeView program (Page 1995).

OR Database

Public DNA and protein databases were searched for OR genes that were compiled into a nonredundant BLAST searchable (FASTA format) protein database of 331 ORs, following the naming convention previously proposed by us (Ehlers et al. 2000; Ziegler et al. 2000a). The database cross-references any previous gene names, original accession numbers and, where available, protein identification (PID) numbers and is available from our ftp site (ftp.sanger.ac.uk/pub/rmy/ROLFdb).

Acknowledgments

We thank all past and present members of the Chromosome 6 Project group (http://www.sanger.ac.uk/HGP/Chr6/), in particular C. Edwards, K. Evans, S. Humphray, M. Mashreghi-Mohammadi, L. Matthews, S. Phillips, V. Rand, S. Sims, S. Smith, A. Tracey, B. Tubby, H. Whitaker, A. Wild, L. Wilming, S. Williams, and J. Rogers. S.B., G.B., R.H., S.M., and A.J.M. were funded by the Wellcome Trust. A.E., S.F., J.T., A.V., and A.Z. were supported by a grant from the Volkswagen-Stiftung. J.T. was funded by a Wellcome Trust program grant. C.A. and K.F.L. were supported by the Howard Hughes Medical Institute, and C.A. also by the IPSEN Foundation. R.M.Y. was supported by a studentship from the UK Medical Research Council (MRC). A.Z. and S.B. also acknowledge the receipt of a Wellcome Trust travel grant.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

  • 5 Present address: CNRS UPR2163, CHU Purpan, 31300 Toulouse, France.

  • 6 Corresponding author.

  • E-MAIL beck{at}sanger.ac.uk; FAX 44 (0) 1223-494919.

  • Article published on-line before print: Genome Res., 10.1101/gr.160301.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.160301.

    • Received August 10, 2000.
    • Accepted January 9, 2001.

REFERENCES

| Table of Contents

Preprint Server