The house fly Y Chromosome is young and minimally differentiated from its ancient X Chromosome partner
Abstract
Canonical ancient sex chromosome pairs consist of a gene rich X (or Z) Chromosome and a male-limited (or female-limited) Y (or W) Chromosome that is gene poor. In contrast to highly differentiated sex chromosomes, nascent sex chromosome pairs are homomorphic or very similar in sequence content. Nascent sex chromosomes can arise if an existing sex chromosome fuses to an autosome or an autosome acquires a new sex-determining locus/allele. Sex chromosomes often differ between closely related species and can even be polymorphic within species, suggesting that nascent sex chromosomes arise frequently over the course of evolution. Previously documented sex chromosome transitions involve changes to both members of the sex chromosome pair (X and Y, or Z and W). The house fly has sex chromosomes that resemble the ancestral fly karyotype that originated ∼100 million yr ago; therefore, the house fly is expected to have X and Y Chromosomes with different gene content. We tested this hypothesis using whole-genome sequencing and transcriptomic data, and we discovered little evidence for genetic differentiation between the X and Y in house fly. We propose that the house fly has retained the ancient X Chromosome, but the ancestral Y was replaced by an X Chromosome carrying a new male determining gene. Our proposed hypothesis provides a mechanism for how one member of a sex chromosome pair can experience evolutionary turnover while the other member remains unaffected.
In organisms in which sex is determined by heritable genetic factors, sex determining loci can reside on sex chromosomes. Sex chromosome systems are divided into two broad categories: (1) males are the heterogametic sex (XY); or (2) females are the heterogametic sex (ZW). In long-established sex chromosomes—such as in birds, eutherian mammals, and Drosophila—the X and Y (or Z and W) Chromosomes are typically highly differentiated (Charlesworth 1996; Charlesworth et al. 2005). The X (or Z) Chromosome usually resembles an autosome in size and gene density, although there are some predicted and observed differences in gene content and evolutionary rates between the X (or Z) and autosomes (Rice 1984; Charlesworth et al. 1987; Vicoso and Charlesworth 2006; Sturgill et al. 2007; Ellegren 2011; Meisel et al. 2012; Meisel and Connallon 2013). In contrast, Y (Z) Chromosomes tend to contain a small number of genes with male-specific (female-specific) functions and are often enriched with repetitive DNA as a result of male-specific (female-specific) selection pressures, a low recombination rate, and a reduced effective population size (Rice 1996; Bachtrog 2013). This X-Y (or Z-W) differentiation results in a heterogametic sex that is effectively haploid for most or all X (or Z) Chromosome genes.
Highly divergent X-Y (or Z-W) pairs trace their ancestry to an undifferentiated autosomal pair (Bull 1983; Charlesworth 1991). Many species harbor undifferentiated sex chromosomes because they are either of recent origin or noncanonical evolutionary trajectories have prevented X-Y (or Z-W) divergence (Stöck et al. 2011; Bachtrog 2013; Vicoso et al. 2013; Yazdi and Ellegren 2014). Recently derived sex chromosomes often result from Robertsonian fusions between an existing sex chromosome and an autosome, or they can arise through a mutation that creates a new sex-determining locus on an autosome (Bachtrog et al. 2014; Beukeboom and Perrin 2014). In both cases, one of the formerly autosomal homologs evolves into an X (or Z) Chromosome, and the other homolog evolves into a Y (or W) Chromosome. In some cases, one or both of the ancestral sex chromosomes can revert back to an autosome when a different autosome becomes a new sex chromosome (Carvalho and Clark 2005; Larracuente et al. 2010; Vicoso and Bachtrog 2013). In all of the scenarios described above, the X and Y (or Z and W) Chromosomes evolve in concert, with an evolutionary transition in one sex chromosome producing a corresponding change in its partner.
Sex chromosome evolution has been extensively studied in higher dipteran flies (Brachycera), where sex chromosome transitions involving X-autosome fusions are common (Patterson and Stone 1952; Schaeffer et al. 2008; Baker and Wilkinson 2010; Vicoso and Bachtrog 2015). The ancestral brachyceran karyotype consists of five large autosomal pairs (known as Muller elements A–E) and a heterochromatic, gene-poor sex chromosome pair (element F is the X Chromosome); this genomic arrangement has been conserved for ∼100 million yr in some lineages (Muller 1940; Foster et al. 1981; Weller and Foster 1993; Vicoso and Bachtrog 2013). In species with the ancestral karyotype, females are XX and males are XY, with a male-determining locus (M factor) on the Y Chromosome (Bopp et al. 2014). Many sex chromosome transitions have occurred across Brachycera, including fusions of ancestral autosomes with the X Chromosome, autosomes transitioning into sex chromosomes, and complete reversions of the ancestral X to an autosome (Carvalho and Clark 2005; Baker and Wilkinson 2010; Larracuente et al. 2010; Vicoso and Bachtrog 2013, 2015).
The house fly (Musca domestica) is a classic model system for studying sex determination and sex chromosomes, because it harbors multiple natural and laboratory variants in sex determining genes and sex chromosomes (Dübendorfer et al. 2002). The house fly karyotype resembles that of the ancestral brachyceran, with five large euchromatic elements and a heterochromatic sex chromosome pair (Boyes et al. 1964). As in other species with that ancestral karyotype, the house fly X and Y Chromosomes can be distinguished based on their length in cytological preparations (Boyes and Van Brink 1965; Denholm et al. 1983; Cakir and Kence 1996; Hediger et al. 1998b). In some close relatives of the house fly that have the ancestral karyotype (e.g., Lucilia blow flies) the ancient X and Y Chromosomes are highly differentiated in gene content (Linger et al. 2015; Vicoso and Bachtrog 2015).
Despite the cytological similarities between the house fly and ancestral karyotypes, there are multiple reasons to suspect that house fly has a Y Chromosome that is not differentiated from the X. First, the house fly M factor (Mdmd) is a recently arisen duplication of the gene encoding the spliceosome-associated protein CWC22 (Sharma et al. 2017), not an ancient gene as would be expected if it were the ancestral male-determining locus of brachycerans. Mdmd has been mapped to the autosomes as well as the Y Chromosome (Sharma et al. 2017), suggesting that the house fly Y and the autosomes harboring Mdmd are all recently derived neo-Y Chromosomes. Second, no sex-linked genetic markers have been identified on the house fly X or Y Chromosomes other than Mdmd (Hamm et al. 2015), suggesting that there are no X-specific genes or genetic variants that are not found on the Y Chromosome. Third, males with an autosomal Mdmd that do not carry a Y Chromosome (XX males) are fertile (Bull 1983; Hamm et al. 2015), suggesting that no essential male fertility genes are unique to the Y Chromosome apart from Mdmd. Fourth, house flies that carry only a single copy of either the X or Y Chromosome (i.e., XO or YO flies) are viable and fertile (Bull 1983; Hediger et al. 1998a), indicating that no essential genes are uniquely found on the X and missing from the Y Chromosome and vice versa. We used whole-genome and transcriptome sequencing to test the hypothesis that the house fly Y Chromosome is young and minimally differentiated from its X Chromosome partner.
Results
Very few X-specific sequences in the house fly genome
Our first goal was to identify house fly X Chromosome sequences not found on the Y (X-specific sequences), which would be
consistent with the hypothesis that house flies have a differentiated sex chromosome pair. The genome sequencing project used
DNA from female flies (XX genotype) to produce the assembly and annotation (Scott et al. 2014), which means there are no Y-specific sequences in the reference. Males of the house fly genomic reference strain (aabys)
have been previously characterized as possessing the XY karyotype (Wagoner 1967; Tomita and Wada 1989; Scott et al. 2014). To identify X-specific genes, we used the Illumina technology to sequence genomic DNA (gDNA) separately from male (XY)
and female (XX) aabys flies (three replicates of each sex), and we aligned the reads to the annotated genome (for read counts,
see Supplemental Data S1). If house fly males have a Y Chromosome that is fully differentiated from the X, we expect females to have twice the sequencing
coverage (
) within genes on Muller element F (the ancestral X Chromosome) as males (Vicoso and Bachtrog 2013). We instead find that the average sequencing coverage in males and females is almost identical (
) for genes on all six chromosomes, and no genes have
(Fig. 1).
Expected sequencing coverage in males relative to females (
) in an XY system with a degenerated Y Chromosome (left), and observed coverage in three house fly strains (aabys, A3, and LPR) for each house fly chromosome (Muller elements in
parentheses). Chromosome assignments are based on orthology relationships with Drosophila melanogaster. Box plots show the median and quartiles, with outliers indicated as points.
We tested if the lack of X-specific genes is common to two other strains of house fly previously reported to have XY males:
A3 and LPR (Scott and Georghiou 1985; Scott et al. 1996; Liu and Yue 2001). We sequenced gDNA from males and females of the A3 and LPR strains, and we aligned those reads to the aabys female reference
genome (for read counts, see Supplemental Data S2, S3). Consistent with the results from the aabys strain, the average relative male-to-female sequencing coverage within genes
in both A3 and LPR is similar across all six chromosomes (Fig. 1). Because we fail to find any genes with a twofold enrichment in females (
), our results suggest that there are no genes on the house fly X Chromosome that are not present on the Y Chromosome. However,
LOC101893103 (the ortholog of muscarinic acetylcholine receptor Dm1) has the most “female-biased” coverage across all three
strains (
), suggesting that it is the best candidate X-specific gene in the house fly genome.
To ensure that our results are not an artifact of poor annotations of house fly X Chromosome genes, we calculated
coverage across nonoverlapping 1-kb intervals in the reference genome. The distribution of
across autosomes is expected to be centered at zero. If males have a single copy of the X Chromosome, we should observe a
second peak at
, indicating a twofold enrichment of X Chromosome sequences in females. We do indeed observe that the distributions of
are centered near zero for all three house fly strains in our analysis, but there is no obvious secondary peak at
in any of the distributions (Fig. 2). To test for a secondary peak at
, we fit a mixture of two normal distributions to our data using an expectation-maximization algorithm with starting values
of
for the means of the two distributions (Benaglia et al. 2009). Most of the 1-kb intervals (93%–99%) are assigned to distributions that are centered near zero, and the remainder of the
intervals are assigned to secondary distributions with means less than zero (Fig. 2). However, those secondary distributions all have a mean greater than −1, suggesting that there are few sequences present
in XX females at twice the abundance as in XY males.
Histograms are plotted of
for 1-kb intervals across three strains. The medians of the distributions are shown. The gray curve shows the normal distribution
that fits the majority of intervals (mean ≈ 0), and the red curve shows the normal distribution that fits the remaining scaffolds
(mean < 0). The λ1 values are the proportion of observed data estimated to be part of the gray normal distribution centered near zero, and λ2 is the proportion estimated to be part of the red distribution with a mean < 0. The red vertical lines show the means of
the λ2 distributions.
Very few Y-specific sequences but some autosome-to-Y duplicates in the house fly genome
We next sought to identify Y Chromosome sequences that are absent from the X Chromosome (i.e., the reciprocal of the analyses
described above). Alignment to a female (XX) reference genome cannot identify Y-specific sequences because they would be absent
from the reference genome. However, we can identify recent Y Chromosome duplicates of autosomal genes because those genes
should have 3:2 male:female coverage (1.5×) when aligned to an XX female reference genome (two autosomal copies plus a Y Chromosome
copy in males as compared to only two autosomal copies in females). There are five genes with ≥1.5× male-biased coverage (
) across all three strains (Table 1). Two of the genes are members of large gene families, suggesting possible Y-specific expansions. One of the other three
genes is Md-ncm (LOC101896466), which is the ancestral autosomal paralog of the male-determining gene Mdmd (Sharma et al. 2017). Our results therefore demonstrate that screening for genes with
coverage can identify recent autosome-to-Y duplications.
Genes with ≥1.5× read mapping coverage in males relative to females in all three strains
To identify Y-specific sequences without autosomal paralogs, we first used the male sequencing reads from the aabys strain to assemble a genome that contains a Y Chromosome using SOAPdenovo2 (Luo et al. 2012). It was necessary to assemble a male genome because the genome project sequenced gDNA from XX female flies (Scott et al. 2014). Then we used a k-mer comparison approach to identify male-specific sequences by searching for male genomic scaffolds that are not matched by female sequencing reads (Carvalho and Clark 2013). Most of the scaffolds in the male genome assembly were (nearly) completely matched by female sequencing reads, and none of the male scaffolds were completely unmatched by female sequencing reads (Fig. 3). We obtain similar results when we use a male genome assembled with ABySS (Simpson et al. 2009) or if we assemble the male genome with SOAPdenovo2 using only male reads that do not align to the female assembly (Supplemental Fig. S1). In contrast, when this approach was used to identify Y Chromosome scaffolds in species with differentiated sex chromosomes (Drosophila and humans), a substantial number of Y Chromosome scaffolds were completely unmatched by female sequencing reads (Carvalho and Clark 2013). These results suggest that there are not large segments of the house fly Y Chromosome that are unique from the X Chromosome or autosomes.
Histogram of the percentage of each scaffold in the male genome assembly that is unmatched by female reads.
We examined the scaffolds from the male house fly genome with a high percentage of sequence unmatched by female reads. We performed blastx searches of the 50 scaffolds with the highest percentage of unmatched sequence (79.7%–96.6%) against the NCBI nonredundant protein database (Altschul et al. 1997). Only six of 50 scaffolds had hits to annotated house fly genes, whereas 27 had hits to transposable element (TE) sequences, three hit other sequences from other species, and 14 had no hits in the database (Supplemental Data S4, S5). The scaffold with the highest percentage of unmatched sequence (96.6%) is 1143 nt long and contains a 219-bp segment that matches an annotated house fly gene on Chromosome 1 that is homologous to a D. melanogaster gene with a predicted membrane associated GRAM domain (CG34392). This scaffold does not have any blastn or blastx hits to other sequences in the database. In addition, there are 22 scaffolds that are both >5 kb and >50% unmatched by female reads (Supplemental Data S4, S6). We also performed blastx of those scaffolds against the NCBI database, and we found that 15 of 22 hit a TE, four hit an annotated house fly gene, and three hit another sequence from a different species. None of the annotated house fly genes hit by these 72 scaffolds are predicted to be on element F (the house fly X Chromosome). Most of these scaffolds (42/72) have sequence similarity with a TE, suggesting that the Y Chromosome may contain unique repetitive sequences or be enriched for particular repeat classes. Notably, none of the scaffolds with a high percentage of sequence unmatched by female reads contain any sequences that resemble Mdmd/Md-ncm, suggesting that this k-mer comparison approach is not effective at identifying very recent autosome-to-Y duplications.
Moderate differences in sequence abundance between house fly males and females
We next examined whether housefly X and Y Chromosomes exhibit differential representation of shared sequences, as might be expected from expansion or contraction of satellite repeats or other repetitive elements. We first used a principal components (PC) analysis to compare read mapping coverage of the male- and female-derived sequences to the reference (XX female) genome. As the input into the PC analysis, we used the number of reads from each of the XY male and XX female sequencing libraries that mapped to each nonoverlapping 1-kb interval. The first PC (PC1) explains 81.5%–91.1% of the variance in coverage across libraries in the three strains, and PC1 clearly separates the male and female sequencing libraries in all three strains (Fig. 4). Therefore, house fly males and females, and by association X and Y Chromosomes, exhibit systematic differences in the abundance of some sequences.
Plot of the first two principal components explaining differential sequencing coverage between female (F) and male (M) libraries.
We applied two different approaches to characterize sequences enriched on the X and Y Chromosomes (i.e., differentially abundant
in female and male genomes). First, we searched for 1-kb windows with significantly different coverage between males and females
(false discovery rate corrected P < 0.05 and
). We identified 214 of these “sex-biased” windows: 63 are >twofold enriched in females, and 151 are >twofold enriched in
males (Supplemental Data S7). The X and Y Chromosomes of house fly are largely heterochromatic (Boyes et al. 1964; Hediger et al. 1998b), and it is possible that differences in the abundances of particular repetitive DNA sequences (e.g., TEs and other interspersed
repeats) between the X and Y Chromosomes are responsible for the differences in read coverage between females and males. Sequences
from repetitive heterochromatic regions of the genome are less likely to be mapped to a genomic location (Smith et al. 2007), and we therefore expect sex-biased windows to be located on scaffolds that are not mapped to a house fly chromosome. Only
two of 63 (3.2%) female-enriched windows are within a scaffold that we were able to map to a chromosome (neither was mapped
to element F, the ancestral X Chromosome). In addition, 59 of 151 (39.1%) male-enriched windows are within a scaffold that
maps to a chromosome (only one of those scaffolds maps to element F). In contrast, 65.7% of 1-kb windows that are not differentially
covered between males and females are on scaffolds that we are able to map to chromosomes (2033/3096 windows with P > 0.05 and
). These “unbiased” windows are more likely to be mapped to a chromosome than the sex-biased windows (P < 10−15 in Fisher's exact test), providing some evidence that differential coverage between males and females could be driven by
repeat content differences between the X and Y Chromosomes.
We next tested for an enrichment of annotated repeats within the female- and male-biased 1-kb windows, and we found that all 63 of the female-biased windows and most of the male-biased windows (149/151) contain sequences masked as repetitive during the house fly genome annotation (Supplemental Data S7). Similarly, nearly all (3071/3096 > 99%) of the 1-kb windows that are not differentially covered between males and females also contain repeat-masked sequences; this fraction is not significantly different than the fraction of repeat-masked sex-biased windows (P = 1 for female-biased and P = 0.6 for male-biased windows using Fisher's exact test). In addition, the proportion of sites within male-biased and female-biased windows that are repeat-masked is less than that of unbiased windows, suggesting that the sex-biased windows are actually depauperate for annotated repeats (Supplemental Fig. S2). However, these analyses are limited because most (≥52%) of the assembled house fly genome is composed of interspersed repeats that are poorly annotated (Scott et al. 2014). Future improvements to repeat annotation in the housefly genome may therefore shed light on the nature of repetitive sequences that differentiate the X and Y Chromosomes.
As a second approach to identify candidate X- or Y-enriched sequences, we first determined the abundances of all possible 2–10 mers in the male and female aabys sequencing reads. This approach will identify smaller sequence motifs that may differentiate the X and Y Chromosomes than the analysis described above, and it does not require any a priori repeat annotations. However, the 100 most common k-mers are found at similar frequencies in both males and females (Fig. 5), with the abundances highly correlated between sexes (r = 0.999). We considered a k-mer to be overrepresented in one sex if the minimum abundance across the three replicate libraries for that sex is greater than the maximum in the other sex. Six k-mers are overrepresented in males using this cutoff, but they are all <twofold enriched in males (Fig. 5; Supplemental Fig. S3). These results suggest that short sequence repeats do not predominantly differentiate the X and Y Chromosomes.
Minimal differences between k-mer abundances in male and female genomes. The left plot shows the abundances of the 100 most common k-mers in the male and female sequencing reads averaged across the three libraries from each sex in the aabys strain. The dashed box in the left graph indicates the subset of the range plotted on the right graph, which contains only k-mers in which abundances in all three libraries from one sex are greater than the three libraries from the other sex. Red triangles indicate k-mers in which abundances in all three female libraries are greater than the three male libraries; blue squares indicate k-mers that are more abundant in male libraries. The dashed line indicates equal representation in males and females.
Relative heterozygosity in males and females suggests that the house fly Y Chromosome is very young
Our data suggest that there are very few X- or Y-specific sequences in the house fly genome. We therefore hypothesize that the house fly Y Chromosome is actually an ancestral brachyceran X Chromosome that recently acquired the male-determining Mdmd gene (Sharma et al. 2017). Although recently derived neo-Y Chromosomes may not differ much in gene content from the gametologous X Chromosome, modest sequence-level X-Y differentiation can result in elevated heterozygosity within sex chromosome genes in males (Vicoso and Bachtrog 2015). We tested for elevated heterozygosity by first identifying polymorphic sites (SNPs) within genes in aabys males and females. Heterozygosity is elevated in X Chromosome (element F) genes relative to autosomes in both males and females (Supplemental Fig. S5). However, when we compare the proportion of heterozygous SNPs in males relative to females for genes on each chromosome (Fig. 6A), genes on the X Chromosome resemble autosomal genes with equivalent heterozygosity in males and females (P = 0.45 in a Mann-Whitney U test comparing male:female heterozygosity on element F with the other chromosomes). This result demonstrates that the house fly Y is so young that Y Chromosome genes have not yet accumulated modest sequence differences from the X Chromosome.
There is elevated heterozygosity on the third chromosome in IIIM males, but not on the X Chromosome in XY males. Box plots show the distribution of the percentage of heterozygous SNPs within genes on each chromosome in either XY aabys males relative to XX aabys females using genomic DNA sequences (A) or IIIM males relative to XY males using RNA-seq data (B). Values >50% indicate elevated heterozygosity in XY males or IIIM males. The median across all autosomes is indicated by a dashed line.
Some house fly males carry Mdmd on the third chromosome (IIIM), and IIIM is expected to be a young neo-Y (Hamm et al. 2015; Sharma et al. 2017). Males that are heterozygous for IIIM and a standard third chromosome (hereafter IIIM males) can have two copies of the X Chromosome (Hamm et al. 2015). If the IIIM chromosome is a young neo-Y, we expect that IIIM males will have an excess of heterozygous SNPs on the third chromosome. To test this hypothesis, we used available RNA-seq data (Meisel et al. 2015) to calculate the proportion of heterozygous SNPs within genes in IIIM males relative to XY males (Fig. 6B). As predicted, there is an excess of heterozygous SNPs on the third chromosome in IIIM males relative to XY males (P = 10−122 in a Mann-Whitney U test comparing Chromosome III with the other autosomes). IIIM males additionally have an elevated number of strain-specific SNPs on the third chromosome (Supplemental Fig. S6). Surprisingly, there is also increased heterozygosity on the X Chromosome in IIIM males relative to XY males (P = 10−4 in a Mann-Whitney U test comparing the X Chromosome with Chromosomes 1, 2, 4, and 5) even though IIIM males have the XX genotype. We also observe elevated heterozygosity on the third chromosome and X Chromosome in IIIM males relative to females of the same strain, but not on the X Chromosome in XY males relative to XX females (Supplemental Fig. S4). These results further support our conclusion that, other than a handful of autosome-to-Y duplications, house fly Y Chromosome genes are not differentiated from X Chromosome genes. In contrast, the IIIM chromosome harbors evidence that it is partially differentiated from the non-M-bearing third chromosome.
Discussion
We found very little evidence for differentiation between the X and Y Chromosomes in house fly, despite the fact that the house fly has a karyotype that resembles the ∼100 million-yr-old ancestral brachyceran karyotype (Boyes et al. 1964; Foster et al. 1981; Weller and Foster 1993; Vicoso and Bachtrog 2013). There are few sequences unique to the X or Y (Figs. 1–3), little evidence for differential abundance of sequences on the X and Y (Fig. 5, but see Fig. 4), and no elevated heterozygosity within X Chromosome genes in XY males (Fig. 6). The strongest evidence for Y Chromosome genes not found on the X are candidate autosome-to-Y duplicates, including the male-determining gene Mdmd (Table 1). We conclude that the house fly X Chromosome has few genes (if any) not found on the Y Chromosome, and there are only a handful of recently acquired Y-specific genes in addition to Mdmd. This is consistent with previous experiments that failed to identify sex-linked markers and found that XX, XO, and YO flies are viable and fertile (Bull 1983; Hediger et al. 1998a; Hamm et al. 2015). Additionally, XY males have equal or greater expression of X Chromosome genes when compared to XX (IIIM) males (Meisel et al. 2015), providing further evidence that XY males do not have a haploid X Chromosome dose. Our results suggest that the house fly Y Chromosome is an ancestral brachyceran X Chromosome that very recently acquired Mdmd via the duplication of Md-ncm (Sharma et al. 2017).
X-Y differentiation in house fly
We detect two forms of minimal X-Y differentiation in the house fly genome. First, there are a few candidate recent autosome-to-Y duplications, including Mdmd (Table 1). Autosome-to-Y duplications are a well-documented source of new Y Chromosome genes across animals (Carvalho et al. 2000, 2001, 2015; Matsuda et al. 2002; Nanda et al. 2002; Hattori et al. 2012; Yano et al. 2012; Hall et al. 2013). Second, there is some evidence for differential abundance of sequences between the X and Y Chromosomes (Fig. 4), but we are unable to identify the specific sequences that are differentially represented (Fig. 5).
Our results are consistent with cytological examinations that have identified morphological differences between the house fly X and Y Chromosomes (Boyes et al. 1964; Hediger et al. 1998b). These morphological differences may be the result of differentially abundant repetitive sequences between the X and Y (Fig. 4). Additionally, in situ hybridizations of chromosomal dissections to mitotic chromosomes detected Y-specific, but not X-specific, segments of the house fly genome (Hediger et al. 1998b). The sequences of these chromosomal segments are unknown, but they presumably include Mdmd and possibly a handful of other recent autosome-to-Y duplications (Table 1). Similarly, cytological analyses previously characterized separate X Chromosomes carrying Mdmd (XM) that were thought to be different from the Y Chromosome (Denholm et al. 1983; Cakir and Kence 1996). Our results suggest that the Y and XM Chromosomes are morphological variants of neo-Y Chromosomes that arose when at least one ancestral X Chromosome recently acquired Mdmd. Such morphological or repeat content variation in Y Chromosomes has been previously documented in Drosophila and other dipterans (Dobzhansky 1935; Miller and Stone 1962; Miller and Roy 1964; Lyckegaard and Clark 1989; Lemos et al. 2008; Hall et al. 2016).
Creation of house fly neo-Y Chromosomes
We hypothesize that the recent duplication of Md-ncm that created Mdmd (Sharma et al. 2017) arose on an X Chromosome, transforming it into a neo-Y (Fig. 7). The ancestral fly Y Chromosome would have subsequently been lost from house fly populations if it did not contain any essential genes (other than the ancestral male determiner), or it could have fused to an autosome (Carvalho and Clark 2005; Larracuente et al. 2010). Alternatively, the house fly Y Chromosome could have arisen through the fusion of the ancestral Y and X Chromosomes. However, after an X-Y fusion, the neo-Y should retain ancestral Y-specific sequences, which we fail to detect.
The IIIM Chromosome is also a neo-Y that arose when Mdmd was duplicated onto a standard third chromosome (Sharma et al. 2017). Curiously, IIIM males have elevated heterozygosity in X Chromosome genes relative to XX females and XY males (Fig. 6; Supplemental Fig. S4). One possible cause of this elevated heterozygosity is that some X Chromosome genes were translocated onto IIIM along with Mdmd. The elevated X Chromosome heterozygosity we detect in IIIM males would therefore be the result of those males being triploid for X Chromosome genes. No nullo-X/Y flies carrying IIIM have been identified to our knowledge, suggesting that the X and Y Chromosomes contain some essential genes not translocated to IIIM. Additional work is necessary to determine the nature of the translocations or duplications of Mdmd that created the Y and IIIM Chromosomes.
Our results are also suggestive of the order of events that created the Y and IIIM Chromosomes. IIIM males have elevated heterozygosity on the third chromosome, whereas XY males do not have elevated heterozygosity on element F (the ancestral X Chromosome) (Fig. 6). This suggests that the Y Chromosome is a younger neo-Y than IIIM, but there is an alternative explanation for the patterns of polymorphism. Differentiation between nascent X and Y Chromosomes is accelerated by suppressed recombination in XY males (Charlesworth 1991; Rice 1996; Charlesworth et al. 2005; Bachtrog 2013). Lack of recombination can be an inherent property of male meiosis (as in Drosophila) or arise via Y Chromosome inversions that suppress crossing over between the X and Y. There is evidence for male recombination in house flies (Feldmeyer et al. 2010), suggesting that chromosomal inversions would be required for recombination suppression between the house fly X and Y. If the IIIM Chromosome carries inversions and the Y Chromosome does not, then the elevated heterozygosity in IIIM males but not XY males could be a result of inversions accelerating the rate of divergence between the IIIM Chromosome and the standard third chromosome (Navarro et al. 2000). However, element F in Drosophila does not experience crossing over in meiosis, and estimates of element F recombination rates from population genetic data are extremely low (Wang et al. 2002; Arguello et al. 2010). This suggests that Y Chromosome inversions might not be necessary to suppress X-Y recombination in house fly. Additional sequencing of XY and IIIM males is needed to test for inversions or other recombination suppressors, which could shed light on the lack of X-Y differentiation.
Translocating sex-determining loci can cause sex chromosome recycling and create cryptic neo-sex chromosomes
Our results provide the first evidence, to our knowledge, of the “recycling” of a sex chromosome pair through the creation of a nascent Y (or W) from an ancient X (or Z) Chromosome (Graves 2005). The recycling of differentiated sex chromosomes into a neo-sex chromosome pair in house fly appears to have happened without a fusion to an autosome. In comparison, most previously documented sex chromosome transitions involved autosomes transforming into sex chromosomes through either the evolution of a novel sex-determining locus on the autosome or a fusion of the autosome with a sex chromosome (e.g., Patterson and Stone 1952; Steinemann and Steinemann 1998; Filatov et al. 2000; Liu et al. 2004; Veyrunes et al. 2004; Carvalho and Clark 2005; Vallender and Lahn 2006; Ross et al. 2009; Vicoso and Bachtrog 2013, 2015). There are other examples of sex chromosome transformations involving only X, Y, Z, and W Chromosomes (i.e., no autosomes) in platyfish, Rana rugosa, and Xenopus tropicalis (Kallman 1984; Miura 2007; Roco et al. 2015). These X/Y/Z/W transformations in fish and frogs, however, involve nascent sex chromosomes, not ancient sex chromosomes as in house fly. Moreover, the sex chromosome transitions in fish and amphibians all involve a change in the heterogametic sex (i.e., XY males to ZW females, or vice versa), whereas the house fly X and Y Chromosomes did not switch to a Z and W.
We hypothesize that the X-to-Y conversion in house fly occurred because Mdmd was duplicated onto an X Chromosome (Sharma et al. 2017). Gene duplications have given rise to new male-determining loci on neo-Y Chromosomes in other taxa (Matsuda et al. 2002; Nanda et al. 2002; Hattori et al. 2012), translocating sex determining loci have been observed in other animals (Traut and Willhoeft 1990; Woram et al. 2003; Faber-Hammond et al. 2015), and there is rampant gene traffic to and from other long-established Y Chromosomes (Koerich et al. 2008; Hughes et al. 2015). These different forms of gene movement suggest that new male-determining (female-determining) loci may arise on established X (Z) Chromosomes in other evolutionary lineages, causing X-to-Y (or W-to-Z) transitions. The fact that the neo-Y Chromosome in house fly remained undetected despite decades of work on this system (Dübendorfer et al. 2002) suggests that X-to-Y (or W-to-Z) transitions may have occurred in other taxa and remain cryptic because the karyotype has remained unchanged.
Methods
Fly strains
We used five house fly strains to identify X and Y Chromosome sequences. One strain, Cornell susceptible (CS), has X/X; IIIM/III males (Scott et al. 1996; Hamm et al. 2005; Meisel et al. 2015). The other four strains have previously been characterized as having males with the XY karyotype: aabys, A3, LPR, and CSaY. The genome strain, aabys, has recessive phenotypic markers on each of the five autosomes (Chromosomes I–V) and had been cytologically determined to have XY males (Wagoner 1967; Tomita and Wada 1989; Scott et al. 2014). The A3 strain was generated by crossing XY males from a pyrethroid-resistant strain (ALHF) with aabys females (Liu and Yue 2001). The LPR strain is a pyrethroid-resistant strain that was previously determined to have XY males (Scott and Georghiou 1985; Scott et al. 1996). Finally, the CSaY strain was created by crossing aabys males (XY) with CS females, and then backcrossing the male progeny to CS females to create a strain with the aabys Y Chromosome on the CS background (Meisel et al. 2015). We validated that the M factor is not on an autosome in the A3, LPR, and CSaY strains by crossing males of each strain to aabys females, and then we backcrossed the male progeny to aabys females. We observed equivalent inheritance in males and females of all aabys phenotypic markers (i.e., no sex-linked inheritance), confirming that the M factor is not on Chromosomes I–V in A3, LPR, or CSaY. Females of all strains were expected to be XX.
Genome sequencing, mapping, and assembly
The house fly genome consortium sequenced, assembled, and annotated the genome using DNA from female flies of the aabys strain, a line with XX females and XY males (Scott et al. 2014). The annotation includes both predicted genes and inferred homology relationships with D. melanogaster genes, and we used the orthology calls from annotation release 100 (assembly version 2.0.2) to assign house fly genomic scaffolds to chromosome arms using a majority rule as described previously (Meisel et al. 2015). Briefly, scaffolds were assigned to a Muller element if the majority of genes on the scaffold with 1:1 D. melanogaster orthologs have orthologs on the same D. melanogaster element. In total, 62 house fly genes have 1:1 D. melanogaster orthologs on Muller element F, which amounts to roughly three-quarters of the approximately 80 genes on Drosophila element F (Leung et al. 2010). We used these 1:1 orthologs to assign seven house fly scaffolds to element F (the X Chromosome), and those seven scaffolds contain 51 genes. We repeated all of the analyses described in “Results” using only genes with 1:1 D. melanogaster orthologs and obtained qualitatively similar results as when we used scaffold-level Muller element assignments.
We sequenced gDNA from aabys male and female heads with 150-bp paired-end reads on an Illumina NextSeq 500 at the University of Houston genome sequencing core. Three replicate libraries of each sex were prepared using the Illumina TruSeq DNA PCR-free kit, and the six libraries were pooled and sequenced in a single high-output run of the machine. We also sequenced gDNA from three replicates of male and female heads from A3 and LPR flies (12 samples total) in another single high-output run on the NextSeq 500 using 75-bp paired-end reads. For each of the 18 sequencing libraries, DNA was extracted from separate pools of fly heads using the QIAGEN DNeasy blood and tissue kit. Illumina sequencing reads were mapped to the assembled house fly genome using BWA-MEM with the default parameters (Li and Durbin 2009; Li 2013), and we only included uniquely mapping reads in which both ends of a sequenced fragment mapped to the same scaffold in the reference genome. Reads that failed to meet these criteria were considered unmapped for the male genome assembly described below. Mapping statistics are presented in Supplemental Table S1. Mapped reads were assigned to annotated genes if mapping coordinates overlap with at least 1 bp of the coordinates of an annotated gene (from beginning to end point of annotation). Reads could therefore map to exons or introns.
We additionally assembled the reads from aabys male samples using SOAPdenovo2 (Luo et al. 2012) and ABySS (Simpson et al. 2009) to construct a reference genome that contains Y Chromosome sequences. Mapping our sequence data to the reference genome revealed that our average insert size was 370 bp (Supplemental Fig. S7), which was used as a parameter in the SOAPdenovo2 genome assembly, along with a pair number cutoff of 3 and a minimum alignment length 32 bp. For the ABySS assembly, we used a k-mer pair span (k) of 64. We also assembled a genome from only male reads that did not align to the female genome reference assembly using SOAPdenovo2 (Luo et al. 2012). For downstream analyses, we only retained scaffolds with a length ≥1000 bp in each assembly. Assembly statistics are presented in Supplemental Table S2.
Identifying X and Y Chromosome sequences
We used four differential coverage approaches to identify candidate X and Y Chromosome sequences in the house fly genome.
The first approach identifies X Chromosome genes or sequences by testing for twofold higher abundance in females relative
to males (Vicoso and Bachtrog 2013). To do this, we used DESeq2 to calculate
within individual genes and 1-kb windows across the three male and female derived libraries for each strain (Love et al. 2014). We also used DESeq2 to calculate P-values for differential coverage between females and males. This approach was also used to identify candidate autosome-to-Y
duplicates with ≥1.5× coverage in males relative to females.
The second approach was used to identify Y Chromosome sequences by searching for scaffolds in the male genome assembly that are missing from the female sequencing reads. We only considered assembled scaffolds from the male genome that were ≥1 kb. We implemented a k-mer comparison approach to identify male-specific sequences (Carvalho and Clark 2013). In our implementation, we used a k-mer size of 15 bp, used the male sequencing reads to construct a validating bit-array, and implemented the options described by Carvalho and Clark (2013) for identifying Y Chromosome sequences in Drosophila genomes (Supplemental Methods S1, S2).
In the third approach, we analyzed gDNA sequencing reads from aabys males and females to identify k-mers with sexually dimorphic abundances. We used the k-Seek method to count the abundance of 2–10 mers in the three male and three female aabys sequencing libraries (Wei et al. 2014). We normalized the k-mer counts by multiplying the count by the length of the k-mer and dividing by the number of reads in the library.
The fourth approach identifies nascent sex chromosomes because they have elevated heterozygosity in the heterogametic sex (Vicoso and Bachtrog 2015). We implemented this approach using both gDNA-seq and mRNA-seq data. For the gDNA-seq, we used the Genome Analysis Toolkit (GATK), following the best practices provided by the software developers (McKenna et al. 2010). Starting with the male and female mapped reads from the aabys strain described above, we identified duplicate reads. Insertions and deletions (indels) were identified and realigned using RealignerTargetCreator and IndelRealigner, respectively. We then called variants in each of the six aabys sequencing libraries using HaplotypeCaller, and we selected the highest quality SNPs and indels using SelectVariants and VariantFiltration (for SNPs: QD < 2, MQ < 40, FS > 60, SOR > 4, MQRankSum < −12.5, ReadPosRankSum < −8; for indels: QD < 2, ReadPosRankSum < −20, FS > 200, SOR > 10). The high-quality SNPs and indels were next used for recalibration of the base calls with BaseRecalibrator and PrintReads. The process of variant calling and base recalibration was performed in three consecutive iterations, at which point there were no benefits of additional base recalibration as validated with AnalyzeCovariates. We next used the recalibrated reads from all three replicates of each sex to call variants in males and females using HaplotypeCaller with emission and calling confidence thresholds of 20. We filtered those variants using VariantFiltration with a cluster window size of 35 bp, cluster size of 3 SNPs, FS > 20, and QD < 2. We used the variant calls to identify heterozygous SNPs within genes using the coordinates from the genome sequencing project (Scott et al. 2014). An example script with our SNP calling pipeline is available in Supplemental Methods S3.
When we implemented the GATK pipeline for variant calling of the mRNA-seq data (accession GSE67065) (Meisel et al. 2015), we used STAR to align reads from 6 XY male libraries and 6 IIIM male libraries separately (Dobin et al. 2013). After aligning reads to the reference genome, we used the aligned reads to create a new reference genome index from the inferred spliced junctions in the first alignment, and then we performed a second alignment with the new reference. We next marked duplicate reads and used SplitNCigarReads to reassign mapping qualities to 60 with the ReassignOneMappingQuality read filter for alignments with a mapping quality of 255. Indels were realigned, and three rounds of variant calling and base recalibration were performed as described above for the gDNA-seq data. We applied GenotypeGVCFs to the variant calls from the two strains for joint genotyping of all samples, and then we used the same filtering parameters as used in the gDNA-seq to extract high-quality SNPs and indels from our variant calls.
Data access
All data generated in this study have been submitted to NCBI BioProject (https://www.ncbi.nlm.nih.gov/bioproject) under the umbrella accession number PRJNA383366. Raw sequencing reads have been submitted to the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) under accession numbers SRX2154714–SRX2154719. The male genome assembly has been submitted to the NCBI Genome database (https://www.ncbi.nlm.nih.gov/genome/) under accession number NDYK00000000. Variant calls have been submitted to dbSNP (https://www.ncbi.nlm.nih.gov/snp) under the submitter handle MEISEL and associated with BioProject PRJNA382546.
Acknowledgments
This project was initiated during discussions with Andy Clark and Rob Unckless, who provided valuable comments throughout the completion of this work. Jeff Scott kindly supplied the A3 and LPR flies. Daniel Bopp and Leo Beukeboom shared their results identifying the Mdmd gene. Illumina sequencing was performed by the University of Houston Sequencing Core, with the assistance of Yinghong Pan and Utpal Pandya. Computational analyses were performed at the University of Houston Center for Advanced Computing and Data Systems, with some assistance from Adrian Garcia and Shuo Zhang. We thank Erin Kelleher and three anonymous reviewers for feedback on the preparation and revisions of this manuscript. This work was supported by start-up funds from the University of Houston.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.215509.116.
- Received September 1, 2016.
- Accepted June 8, 2017.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.


















