Conserved microRNA targeting reveals preexisting gene dosage sensitivities that shaped amniote sex chromosome evolution

  1. David C. Page1,2,4
  1. 1Whitehead Institute, Cambridge, Massachusetts 02142, USA;
  2. 2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
  3. 3Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
  4. 4Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts 02142, USA
  • Corresponding author: dcpage{at}wi.mit.edu
  • Abstract

    Mammalian X and Y Chromosomes evolved from an ordinary autosomal pair. Genetic decay of the Y led to X Chromosome inactivation (XCI) in females, but some Y-linked genes were retained during the course of sex chromosome evolution, and many X-linked genes did not become subject to XCI. We reconstructed gene-by-gene dosage sensitivities on the ancestral autosomes through phylogenetic analysis of microRNA (miRNA) target sites and compared these preexisting characteristics to the current status of Y-linked and X-linked genes in mammals. Preexisting heterogeneities in dosage sensitivity, manifesting as differences in the extent of miRNA-mediated repression, predicted either the retention of a Y homolog or the acquisition of XCI following Y gene decay. Analogous heterogeneities among avian Z-linked genes predicted either the retention of a W homolog or gene-specific dosage compensation following W gene decay. Genome-wide analyses of human copy number variation indicate that these heterogeneities consisted of sensitivity to both increases and decreases in dosage. We propose a model of XY/ZW evolution incorporating such preexisting dosage sensitivities in determining the evolutionary fates of individual genes. Our findings thus provide a more complete view of the role of dosage sensitivity in shaping the mammalian and avian sex chromosomes and reveal an important role for post-transcriptional regulatory sequences (miRNA target sites) in sex chromosome evolution.

    The mammalian X and Y Chromosomes evolved from a pair of ordinary autosomes over the past 300 myr (Lahn and Page 1999). Only 3% of genes on the ancestral pair of autosomes survive on the human Y Chromosome (Skaletsky et al. 2003; Bellott et al. 2010) compared with 98% on the X Chromosome (Mueller et al. 2013). In females, one copy of the X Chromosome is silenced by X-inactivation (XCI); this silencing evolved on a gene-by-gene basis following Y gene loss in males and X up-regulation in both sexes (Jegalian and Page 1998; Ross et al. 2005; Berletch et al. 2015; Tukiainen et al. 2017), and some genes escape XCI in humans (Carrel and Willard 2005) and other mammals (Yang et al. 2010). Dosage compensation refers to any mechanism restoring ancestral dosage following gene loss from the sex-specific chromosome. In mammalian males, therefore, dosage compensation consisted solely of X up-regulation, as it returned X-linked gene expression to ancestral levels following Y gene loss. In females, dosage compensation involved both X up-regulation and the acquisition of XCI, which increased and decreased X-linked expression levels, respectively. Since females did not undergo any initial decrease in ancestral dosage due to Y gene loss, X up-regulation and the acquisition of XCI together restored ancestral expression levels.

    In parallel, the avian Z and W sex chromosomes evolved from a different pair of autosomes than the mammalian X and Y Chromosomes (Nanda et al. 1999; Ross et al. 2005; Bellott et al. 2010). Decay of the female-specific W Chromosome was similarly extensive, but birds did not evolve a large-scale inactivation of Z-linked genes analogous to XCI in mammals (Itoh et al. 2007). Dosage compensation, as measured by a male/female expression ratio close to one, has been observed for some Z-linked genes in some tissues (Mank and Ellegren 2009; Uebbing et al. 2015; Zimmer et al. 2016). Thus, genes previously found on the ancestral autosomes that gave rise to the mammalian or avian sex chromosomes have undergone significant changes in gene dosage. In modern mammals, these molecular events have resulted in three classes of ancestral X-linked genes representing distinct evolutionary fates: those with a surviving Y homolog, those with no Y homolog and subject to XCI, and those with no Y homolog but escaping XCI. In birds, two clear classes of ancestral Z-linked genes have arisen: those with or without a W homolog, with additional heterogeneity among Z-linked genes without a W homolog as a result of gene-specific dosage compensation. Identifying gene-by-gene properties that distinguish classes of X- and Z-linked genes is thus crucial to understanding the selective pressures underlying the molecular events of mammalian and avian sex chromosome evolution.

    Emerging evidence suggests a role for gene dosage sensitivity in mammalian and avian sex chromosome evolution. X- and Z-linked genes with surviving homologs on the mammalian Y or avian W Chromosomes are enriched for important regulatory functions and predictors of haploinsufficiency compared with those lacking Y or W homologs (Bellott et al. 2014, 2017); similar observations have been made in fish (White et al. 2015) and Drosophila (Kaiser et al. 2011). Human X- and chicken Z-linked genes that show the strongest signatures of dosage compensation in either lineage also show signs of dosage sensitivity as measured by membership in large protein complexes (Pessia et al. 2012) or evolutionary patterns of gene duplication and retention (Zimmer et al. 2016). Despite these advances, little is known regarding selective pressures resulting from sensitivity to dosage increases, as these studies either focused on haploinsufficiency or employed less direct predictors of dosage sensitivity. Furthermore, it is not known whether heterogeneities in dosage sensitivity among classes of sex-linked genes were acquired during sex chromosome evolution, or predated the emergence of sex chromosomes, as there has been no explicit, systematic reconstruction of dosage sensitivity on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes.

    To assess the role of preexisting dosage sensitivities in XY and ZW evolution, we sought to employ a measure of dosage sensitivity that could be (1) demonstrably informative with respect to sensitivity to dosage increases and (2) explicitly reconstructed on the ancestral autosomes. We focused on regulation by microRNAs (miRNAs), small noncoding RNAs that function as tuners of gene dosage by lowering target mRNA levels through pairing to the 3′ untranslated region (UTR) (Bartel 2009). The repressive nature of miRNA targeting is informative with respect to sensitivity to dosage increases, allowing for a more complete understanding of the role of dosage sensitivity in sex chromosome evolution. Both miRNAs themselves and their complementary target sites can be preserved over millions of years of vertebrate evolution, facilitating the reconstruction of miRNA targeting on the ancestral autosomes through cross-species sequence alignments. As miRNA targeting occurs post-transcriptionally, reconstruction of its ancestral state is decoupled from transcriptional regulatory mechanisms such as XCI that evolved following X-Y differentiation.

    Results

    Analysis of human copy number variation indicates conserved miRNA targeting of genes sensitive to dosage increases

    We first sought to determine whether conserved targeting by miRNAs correlates with sensitivity to dosage increases across the human genome. To estimate pressure to maintain miRNA targeting, we used published probabilities of conserved targeting (PCT scores) for each gene–miRNA interaction in the human genome. The PCT score reflects an estimate of the probability that a given gene–miRNA interaction is conserved due to miRNA targeting, obtained by calculating the conservation of the relevant miRNA target sites relative to the conservation of the entire 3′ UTR (Friedman et al. 2009). In this manner, the PCT score intrinsically controls for differences in background conservation and sequence composition, both of which vary widely among 3′ UTRs due to differing rates of expression divergence and/or sequence evolution. We refer to these PCT scores as “miRNA conservation scores” in the remainder of the text.

    A recent study reported a correlation between these miRNA conservation scores and predicted haploinsufficiency (Pinzón et al. 2017), indicating that conserved miRNA targeting broadly corresponds to dosage sensitivity. However, such a correlation does not isolate the effects of sensitivity to dosage increases, which we expect to be particularly important in the context of miRNA targeting. We reasoned that genes for which increases in dosage are deleterious should be depleted from the set of observed gene duplications in healthy human individuals. We used a catalog of rare genic copy number variation among 59,898 control human exomes (Exome Aggregation Consortium [ExAC]) (Ruderfer et al. 2016) to classify autosomal protein-coding genes as exhibiting or lacking duplication or deletion in healthy individuals (see Methods). We compared duplicated and nonduplicated genes with the same deletion status in order to control for differences in sensitivity to underexpression. We found that nonduplicated genes have significantly higher miRNA conservation scores than duplicated genes, irrespective of deletion status (Fig. 1A,B). Nondeleted genes also have significantly higher scores than deleted genes irrespective of duplication status (Supplemental Fig. S1), but duplication status has a greater effect on miRNA conservation scores than does deletion status (Fig. 1C, blue vs. orange boxes). Thus, conserved miRNA targeting is a feature of genes sensitive to changes in gene dosage in humans and is especially informative with regards to sensitivity to dosage increases, consistent with the known role of miRNAs in tuning gene dosage by lowering target mRNA levels.

    Figure 1.

    Conserved miRNA targeting of autosomal genes stratified by copy number variation in 59,898 human exomes. Probabilities of conserved targeting (PCT) of all gene–miRNA interactions involving nonduplicated and duplicated genes, further stratified as (A) deleted (gray, n = 69,339 interactions from 4118 genes; blue, n = 80,290 interactions from 3976 genes) or (B) not deleted (orange, n = 51,514 interactions from 2916 genes; purple, n = 72,826 interactions from 3510 genes). (***) P < 0.001, two-sided Kolmogorov–Smirnov test. (C) Mean gene-level PCT scores. (**) P < 0.01, (***) P < 0.001, two-sided Wilcoxon rank-sum test.

    X-Y pairs and X-inactivated genes have higher miRNA conservation scores than X escape genes

    We next assessed whether the three classes of X-linked genes differ with respect to dosage sensitivity as inferred by conserved miRNA targeting. To delineate these classes, we began with the set of ancestral genes reconstructed through cross-species comparisons of the human X Chromosome and orthologous chicken autosomes (Bellott et al. 2010, 2014, 2017; Hughes et al. 2012; Mueller et al. 2013). We designated ancestral X-linked genes with a surviving human Y homolog (Skaletsky et al. 2003) as X-Y pairs and also considered the set of X-linked genes with a surviving Y homolog in any of eight mammals (Bellott et al. 2014) to increase the phylogenetic breadth of findings regarding X-Y pairs. A number of studies have cataloged the inactivation status of X-linked genes in various human tissues and cell types. We used a meta-analysis that combined results from three studies by assigning a “consensus” X-inactivation status to each gene (Balaton et al. 2015) to designate the remainder of ancestral genes lacking a Y homolog as subject to or escaping XCI. In summary, we classified genes as either: (1) X-Y pairs, (2) lacking a Y homolog and subject to XCI (X-inactivated), or (3) lacking a Y homolog but escaping XCI (X escape).

    We found that human X-Y pairs have the highest miRNA conservation scores, followed by X-inactivated and finally X escape genes (Fig. 2A,B). The expanded set of X-Y pairs across eight mammals also has significantly higher miRNA conservation scores than ancestral X-linked genes with no Y homolog (Supplemental Fig. S2). Observed differences between miRNA conservation scores are not driven by distinct subsets of genes in each class, as indicated by gene resampling with replacement (Supplemental Fig. S3). The decrease in miRNA conservation scores of X escape genes relative to X-inactivated genes and X-Y pairs is not driven by genes that escape XCI variably across individuals (Supplemental Fig. S4), and was consistent even when including ambiguous genes as either X-inactivated or X escape genes (Supplemental Fig. S5). We also verified that these differences were not driven by artificially inflated or deflated conservation scores of certain target sites due to nonuniformity in 3′ UTR conservation (Methods; Supplemental Fig. S6).

    Figure 2.

    X-Y pairs and X-inactivated genes have higher miRNA conservation scores than X escape genes. PCT score distributions of all gene–miRNA interactions involving (A) human X-Y pairs (n = 371 interactions from 15 genes), X-inactivated genes (n = 6743 interactions from 329 genes), and X escape genes (n = 1037 interactions from 56 genes). (**) P < 0.01, two-sided Kolmogorov–Smirnov test. (B) Mean gene-level PCT scores. (*) P < 0.05, (**) P < 0.01, two-sided Wilcoxon rank-sum test.

    Finally, we assessed whether miRNA conservation scores distinguish the three classes by providing additional information not accounted for by known factors (Bellott et al. 2014) influencing evolutionary outcomes. We used logistic regression to model, for each gene, the probability of falling into each of the three classes (X-Y pair, X-inactivated, or X escape) as a linear combination of haploinsufficiency probability (pHI) (Huang et al. 2010); human expression breadth (The GTEx Consortium 2015); purifying selection, measured by the ratio of nonsynonymous to synonymous substitution rates (dN/dS) between human and mouse orthologs (Yates et al. 2016); and mean gene-level miRNA conservation scores. We note that pHI is a score composed of several genic features, one of which is the number of protein–protein interactions, consistent with the idea that members of large protein complexes tend to be dosage-sensitive (Papp et al. 2003; Pessia et al. 2012). Removing either miRNA conservation or pHI as predictors from the full model resulted in inferior model fits as measured by the Akaike information criterion (AIC) (full model, AIC 321.5; full model minus miRNA conservation, AIC 327.7; full model minus pHI, AIC 327.3; higher AIC indicates inferior model). Therefore, miRNA conservation and pHI contribute independent information that distinguishes the three classes of X-linked genes. Based on our analyses of autosomal copy number variation (Fig. 1), we attribute this independence to the fact that miRNA conservation scores are most informative with regards to sensitivity to dosage increases. Taken together, these results indicate significant heterogeneity in dosage sensitivity, as inferred by miRNA target site conservation, among the three classes of ancestral X-linked genes: X-Y pairs are the most dosage-sensitive, while X-inactivated genes are of intermediate dosage sensitivity, and X escape genes are the least dosage-sensitive.

    Heterogeneities in X-linked miRNA targeting were present on the ancestral autosomes

    We next asked whether differences in miRNA targeting were present on the ancestral autosomes that gave rise to the mammalian X and Y Chromosomes. To reconstruct the ancestral state of miRNA targeting, we first focused on miRNA target sites in the 3′ UTR of human orthologs that align with perfect identity to a site in the corresponding chicken ortholog; these sites were likely present in the common ancestor of mammals and birds (Fig. 3A). We found that X-Y pairs have the most human–chicken conserved target sites, followed by X-inactivated genes, and then X escape genes (Fig. 3B, top). Unlike the miRNA conservation scores used earlier, this metric does not account for background conservation; we therefore estimated the background conservation of each 3′ UTR using shuffled miRNA family seed sequences (see Methods). X-Y pairs, X-inactivated genes, and X escape genes do differ significantly with respect to background conservation (Supplementary Fig. S7), but these differences cannot account for the observed differences in true human–chicken conserved sites (Fig. 3B, bottom). We observed similar results for the expanded set of X-Y pairs across eight mammals (Supplemental Fig. S8A).

    Figure 3.

    Heterogeneities in X-linked miRNA targeting were present on the ancestral autosomes. (A) Example reconstruction of an ancestral miR-96 target site in the 3′ UTR of KDM6A, an X-linked gene in the X-added region (XAR) with a surviving Y homolog. (Left) Species tree overlaid with events in mammalian sex chromosome evolution. (Right) Multiple sequence alignment; dots in nonhuman species indicate identity with human sequence; dashes indicate gaps in alignment. (B) Distributions of sites conserved between 3′ UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom; see Methods) for human X-Y pairs (n = 16), X-inactivated genes (n = 251), and X escape genes (n = 42). (C) Statistics as in B, but using sites conserved between chicken and opossum 3′ UTRs only for genes in the XAR: X-Y pairs (n = 11), X-inactivated genes (n = 58), and X escape genes (n = 27).

    Differences in the number of human–chicken conserved sites among the three classes of X-linked genes could be explained by heterogeneity in miRNA targeting present on the ancestral autosomes or by ancestral homogeneity followed by different rates of target site loss during or following X-Y differentiation. To distinguish between these two possibilities, we took advantage of previous reconstructions of human sex chromosome evolution (Fig. 3A; Bellott et al. 2014), which confirmed that, following the divergence of placental mammals from marsupials, an X-autosome chromosomal fusion generated the X-added region (XAR) (Watson et al. 1990). Genes on the XAR are therefore X-linked in placental mammals but autosomal in marsupials such as the opossum. We limited our analysis to genes in the XAR and target sites conserved between orthologous chicken and opossum 3′ UTRs, ignoring site conservation in humans; these sites were likely present in the common ancestor of mammals and birds, and an absence of such sites cannot be explained by site loss following X-Y differentiation. We observed the same pattern as with the human–chicken conserved sites, both before and after accounting for background 3′ UTR conservation (Fig. 3C, three gene classes; Supplemental Fig. S8B, X-Y pairs across eight mammals). These results demonstrate that the autosomal precursors of X-Y pairs and X-inactivated genes were subject to more miRNA-mediated regulation than X escape genes. Combined with our earlier results, we conclude that present-day heterogeneities in dosage sensitivity on the mammalian X Chromosome existed on the ancestral autosomes from which it derived.

    Z-W pairs have higher miRNA conservation scores than other ancestral Z-linked genes

    We next assessed whether classes of avian Z-linked genes, those with and without a W homolog, show analogous heterogeneities in sensitivity to dosage increases. We used the set of ancestral genes reconstructed through cross-species comparisons of the avian Z Chromosome and orthologous human autosomes and focused on the set of Z-W pairs identified by sequencing of the chicken W Chromosome (Bellott et al. 2010, 2017). To increase the phylogenetic breadth of our comparisons, we also included candidate Z-W pairs obtained through comparisons of male and female genome assemblies (four-species set) or inferred by read-depth changes in female genome assemblies (14-species set; for details, see Methods) (Zhou et al. 2014). The more complete 3′ UTR annotations in the human genome relative to chicken allow for a more accurate assessment of conserved miRNA targeting. Accordingly, we analyzed the 3′ UTRs of the human orthologs of chicken Z-linked genes.

    We found that the human orthologs of Z-W pairs have higher miRNA conservation scores than the human orthologs of other ancestral Z genes (Fig. 4A,B). Differences in miRNA conservation scores between Z-W pairs and other ancestral Z genes remained significant when considering the expanded sets of Z-W pairs across four and 14 avian species (Supplemental Fig. S9). These differences are not driven by distinct subsets of genes, as indicated by gene resampling with a replacement (Supplemental Fig. S10), and cannot be accounted for by within-UTR variation in regional conservation (Supplemental Fig. S11). Logistic regression models indicate that miRNA conservation scores provide additional information not captured by known factors (Bellott et al. 2017) influencing survival of W-linked genes (full model, AIC 127.1; full model minus miRNA conservation, AIC 137.8; full model minus pHI, AIC 132.7; higher AIC indicates inferior model). Together, these results indicate that Z-linked genes with a surviving W homolog are more sensitive to changes in dosage—both increases and decreases—than are genes without a surviving W homolog.

    Figure 4.

    Z-W pairs have higher miRNA conservation scores than other ancestral Z-linked genes. (A) PCT score distributions of all gene–miRNA interactions involving the human orthologs of chicken Z-W pairs (n = 832 interactions from 28 genes) and other ancestral Z genes (n = 16,692 interactions from 657 genes). (***) P < 0.001, two-sided Kolmogorov–Smirnov test. (B) Mean gene-level PCT scores. (***) P < 0.001, two-sided Wilcoxon rank-sum test.

    While there are two clear classes of Z-linked genes—those with or without a W homolog—studies of Z-linked gene expression have suggested additional heterogeneity among Z-linked genes without a W homolog due to gene-specific dosage compensation (Mank and Ellegren 2009; Uebbing et al. 2015; Zimmer et al. 2016). If Z-linked genes with no W homolog exist upon a continuum from noncompensated to dosage compensated, those that are more compensated should have more conserved miRNA target sites, reflective of greater dosage sensitivity. We quantified the dosage compensation by using RNA sequencing data (Marin et al. 2017) to compare, in four somatic tissues, the chicken male/female expression ratio to the analogous ratio in human and Anolis (see Methods). In the brain, kidney, and liver, Z-linked genes with no W homolog and higher mean miRNA conservation scores had male/female expression ratios closer to one (Supplemental Fig. S12). Thus, in addition to the above-described differences between Z-linked genes with or without a W homolog, Z-linked genes with no W homolog but with more effective dosage compensation have more conserved miRNA target sites than noncompensated genes.

    Heterogeneities in Z-linked miRNA targeting were present on the ancestral autosomes

    We next asked whether differences in miRNA targeting between Z-W pairs and other ancestral Z-linked genes were present on the ancestral autosomes that gave rise to the avian Z and W Chromosomes. We found that chicken Z-W pairs have more human–chicken-conserved miRNA target sites than their Z-linked counterparts without surviving W homologs, both before (Fig. 5B, top) and after (Fig. 5B, bottom) accounting for the background conservation of each individual 3′ UTR. To confirm that these differences represent ancestral heterogeneity rather than differential site loss during or following Z-W differentiation, we instead considered the number of sites conserved between human and Anolis lizard, which diverged from birds prior to Z-W differentiation (Fig. 5A). Chicken Z-W pairs contain an excess of human–Anolis conserved miRNA target sites, both before (Fig. 5C, top) and after (Fig. 5C, bottom) accounting for the background conservation of each individual 3′ UTR. We observed similar results with the predicted four-species (Supplemental Fig. S13) and 14-species (Supplemental Fig. S14) sets of Z-W pairs. Thus, the autosomal precursors of avian Z-W pairs were subject to more miRNA-mediated regulation than the autosomal precursors of Z-linked genes that lack a W homolog. Furthermore, in the liver and brain, Z-linked genes with no W homolog with an excess of human–chicken-conserved miRNA sites had male/female expression ratios closer to one, implying more effective dosage compensation (Supplemental Fig. S15). Together, these results indicate heterogeneity in dosage sensitivity among genes on the ancestral autosomes that gave rise to the avian Z Chromosome.

    Figure 5.

    Heterogeneities in Z-linked miRNA targeting were present on the ancestral autosomes. (A) Example reconstruction of an ancestral miR-145 target site in the 3′ UTR of RASA1, a Z-linked gene with a surviving W homolog. (Left) Species tree overlaid with events in avian sex chromosome evolution. (Right) Multiple sequence alignment; dots in nonhuman species indicate identity with human sequence; dashes indicate gaps in alignment. (B) Numbers of sites conserved between 3′ UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom) for chicken Z-W pairs (n = 27) and other ancestral Z genes (n = 578). (C) Statistics as in B, but using sites conserved between human and Anolis 3′ UTRs.

    Analyses of experimental data sets validate miRNA target site function

    Our results to this point, which indicate preexisting heterogeneities in dosage constraints among X- or Z-linked genes as inferred by predicted miRNA target sites, lead to predictions regarding the function of these sites in vivo. To test these predictions, we turned to publicly available experimental data sets consisting both of gene expression profiling following transfection or knockout of individual miRNAs, and of high-throughput crosslinking-immunoprecipitation (CLIP) to identify sites that bind Argonaute in vivo (see Methods). If the above-studied sites are effective in mediating target repression, targets of an individual miRNA should show increased expression levels or Argonaute binding following miRNA transfection and show decreased expression levels following miRNA knockout. Together, our analyses of publicly available data sets fulfilled these predictions, validating the function of these sites in multiple cellular contexts and species (Fig. 6). From the gene expression profiling data, we observed results consistent with effective targeting by (1) 11 different miRNA families in human HeLa cells (Supplemental Fig. S16), (2) four different miRNAs in human HCT116 and HEK293 cells (Supplemental Fig. S17), and (3) miR-155 in mouse B and Th1 cells (Supplemental Fig. S18). In the CLIP data, the human orthologs of X- or Z-linked targets of miR-124 are enriched for Argonaute-bound clusters that appear following miR-124 transfection, while a similar but nonsignificant enrichment is observed for miR-7 (Supplemental Fig. S19). Thus, conserved miRNA target sites used to infer dosage constraints on X-linked genes, and the autosomal orthologs of Z-linked genes can effectively mediate target repression in living cells.

    Figure 6.

    Analyses of experimental data sets validate miRNA target site function. Responses to transfection (AC) or knockout (D) of indicated miRNAs in human (AC) or mouse (D) cell types. Each panel depicts corresponding changes in mRNA levels (A,B), in fraction of Argonaute-bound genes (C), and in mRNA stability and translational efficiency as measured by ribosome protected fragments (RPF; D). In each case, X-linked genes and the human orthologs of Z-linked genes containing target sites with an assigned PCT score (red) for the indicated miRNA were compared with all expressed genes lacking target sites (black); gene numbers are indicated in parentheses. (A,B,D) (***) P < 0.001, two-sided Kolmogorov–Smirnov test. (C) (*) P < 0.05, two-sided Fisher's exact test.

    Discussion

    Here, through the evolutionary reconstruction of miRNA target sites, we provide evidence for preexisting heterogeneities in dosage sensitivity among genes on the mammalian X and avian Z Chromosomes. We first showed that, across all human autosomal genes, dosage sensitivity—as indicated by patterns of genic copy number variation—correlates with the degree of conserved miRNA targeting. We found that conserved targeting correlates especially strongly with sensitivity to dosage increases, consistent with miRNA targeting serving to reduce gene expression. Turning to the sex chromosomes of mammals and birds, genes that retained a homolog on the sex-specific Y or W Chromosome (X-Y and Z-W pairs) have more conserved miRNA target sites than genes with no Y or W homolog. In mammals, genes with no Y homolog that became subject to XCI have more conserved sites than those that continued to escape XCI following Y gene decay. In birds, across Z-linked genes with no W homolog, the degree of conserved miRNA targeting correlates with the degree of gene-specific dosage compensation. We then reconstructed the ancestral state of miRNA targeting, observing significant heterogeneities in the extent of miRNA targeting, and thus dosage sensitivity, on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes. Finally, through analysis of publicly available experimental data sets, we validated the function, in living cells, of the miRNA target sites used to infer dosage sensitivity. We thus conclude that differences in dosage sensitivity—both to increases and to decreases in gene dosage—among genes on the ancestral autosomes influenced their evolutionary trajectory during sex chromosome evolution, not only on the sex-specific Y and W Chromosomes, but also on the sex-shared X and Z Chromosomes.

    Our findings build upon previous work in three important ways. First, our analysis of miRNA-mediated repression indicates that these heterogeneities consist of sensitivities to dosage increases and decreases, whereas previous studies had either focused on sensitivity to underexpression or could not differentiate the two. Second, our reconstruction of miRNA targeting on the ancestral autosomes provides direct evidence that heterogeneities in dosage sensitivity among classes of X- and Z-linked were preexisting rather than acquired during sex chromosome evolution. Finally, by pointing to specific regulatory sequences (miRNA target sites) functioning to tune gene dosage both prior to and during sex chromosome evolution, our study provides a view of dosage compensation encompassing post-transcriptional regulation.

    Human disease studies support the claim that increased dosage of X-Y pairs and X-inactivated genes is deleterious to fitness. Copy number gains of the X-linked gene KDM6A, which has a surviving human Y homolog, are found in patients with developmental abnormalities and intellectual disability (Lindgren et al. 2013). HDAC6, CACNA1F, GDI1, and IRS4 all lack Y homologs and are subject to XCI in humans. A mutation in the 3′ UTR of HDAC6 abolishing targeting by miR-433 has been linked to familial chondrodysplasia in both sexes (Simon et al. 2010). Likely gain-of-function mutations in CACNA1F cause congenital stationary night blindness in both sexes (Hemara-Wahanui et al. 2005). Copy number changes of GDI1 correlate with the severity of X-linked mental retardation in males, with female carriers preferentially inactivating the mutant allele (Vandewalle et al. 2009). Somatic genomic deletions downstream from IRS4 lead to its overexpression in lung squamous carcinoma (Weischenfeldt et al. 2017). Males with partial X disomy due to translocation of the distal long arm of the X Chromosome (Xq28) to the long arm of the Y Chromosome show severe mental retardation and developmental defects (Lahn et al. 1994). Most genes in Xq28 are inactivated in 46,XX females but escape inactivation in such X;Y translocations, suggesting that increased dosage of Xq28 genes caused the cognitive and developmental defects. We anticipate that further studies will reveal additional examples of the deleterious effects of increases in gene dosage of X-Y pairs and X-inactivated genes.

    We and others previously proposed that Y gene decay drove up-regulation of homologous X-linked genes in both males and females and that XCI subsequently evolved at genes sensitive to increased expression from two active X-linked copies in females (Ohno 1967; Jegalian and Page 1998). Our finding that X-inactivated genes have higher miRNA conservation scores than X escape genes is consistent with this aspect of the model. However, recent studies indicating heterogeneity in dosage sensitivity between classes of mammalian X- or avian Z-linked genes (Pessia et al. 2012; Bellott et al. 2014, 2017; Zimmer et al. 2016), combined with the present finding that these dosage sensitivities existed on the ancestral autosomes, challenge the previous assumption of a single evolutionary pathway for all sex-linked genes.

    We therefore propose a revised model of X-Y and Z-W evolution in which the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes contained three (or two, in the case of birds) classes of genes with differing dosage sensitivities (Fig. 7A,B). For ancestral genes with high dosage sensitivity, Y or W gene decay would have been highly deleterious, and thus the Y- or W-linked genes were retained. According to our model, these genes’ high dosage sensitivity also precluded up-regulation of the X- or Z-linked homolog and, in mammals, subsequent X-inactivation; indeed, their X-linked homologs continue to escape XCI (Bellott et al. 2014). For ancestral mammalian genes of intermediate dosage sensitivity, Y gene decay did occur and was accompanied or followed by compensatory up-regulation of the X-linked homolog in both sexes; the resultant increased expression in females was deleterious and led to the acquisition of XCI. Ancestral mammalian genes of low dosage sensitivity continued to escape XCI following Y decay; heterogeneity in X up-regulation may further subdivide such genes (Fig. 6A). These genes’ dosage insensitivity set them apart biologically, and evolutionarily, from the other class of X-linked genes escaping XCI—those with a surviving Y homolog.

    Figure 7.

    An evidence-based model of preexisting heterogeneities in dosage sensitivity shaping mammalian and avian sex chromosome evolution. In this model, preexisting heterogeneities in dosage sensitivity determined the trajectory of Y/W gene loss in both mammals and birds, and of subsequent X-inactivation in mammals and dosage compensation in birds. Colored arrow widths are scaled approximately to the number of ancestral genes in each class. (A) The dashed orange line represents the possibility that a subset of X-linked genes may have not undergone compensatory X up-regulation following Y gene decay. (B) Ancestral Z genes with no W homolog follow a gradient of preexisting dosage sensitivity (top; gray to white), which determined the degree of dosage compensation following W gene loss (bottom; gray to white).

    Our revised model relates preexisting, gene-by-gene heterogeneities in dosage sensitivity to the outcomes of sex chromosome evolution. However, the suppression of X-Y recombination did not occur on a gene-by-gene basis, instead initiating Y gene decay and subsequent dosage compensation through a series of large-scale inversions encompassing many genes (Lahn and Page 1999). The timings and boundaries of these evolutionary strata varied among mammalian lineages, thus leading to unique chromosome-scale evolutionary dynamics across mammals. These large-scale changes would have then allowed for genic selection to take place according to the preexisting dosage sensitivities outlined above. In this way, the course of sex chromosome evolution in mammals is a composite of (1) preexisting, gene-by-gene dosage sensitivities and (2) the manner in which the history of the X and Y unfolded in particular lineages via discrete, large-scale inversions.

    In this study, we have focused on classes of ancestral X-linked genes delineated by the survival of a human Y homolog or by the acquisition of XCI in humans, but such evolutionary states can differ among mammalian lineages and species. In mouse, for instance, both Y gene decay (Bellott et al. 2014) and the acquisition of X-inactivation (Yang et al. 2010) are more complete than in humans or other mammals, as exemplified by RPS4X, which retains a Y homolog and continues to escape XCI in primates but has lost its Y homolog and is subject to XCI in rodents. These observations could be explained by shortened generation times in the rodent lineage, resulting in longer evolutionary times, during which the forces leading to Y gene decay and the acquisition of X-inactivation could act (Ohno 1967; Charlesworth and Crow 1978; Jegalian and Page 1998). Another case of lineage differences involves HUWE1, which lacks a Y homolog and is subject to XCI in both human and mouse but retains a functional Y homolog in marsupials, where it continues to escape XCI. In the future, more complete catalogs of X-inactivation and escape in additional mammalian lineages would make it possible to examine whether analogous, preexisting dosage sensitivities differentiate the three classes of X-linked genes (X-Y pairs, X-inactivated genes, and X escape genes) in other species.

    Previous studies have sought evidence of X-linked up-regulation during mammalian sex chromosome evolution using comparisons of gene expression levels between the whole of the X Chromosome and all of the autosomes, with equal numbers of studies rejecting or finding evidence consistent with up-regulation (Xiong et al. 2010; Deng et al. 2011; Kharchenko et al. 2011; Julien et al. 2012; Lin et al. 2012). This is likely due to gene-by-gene heterogeneity in dosage sensitivities that resulted in a stronger signature of up-regulation at more dosage-sensitive genes (Pessia et al. 2012). Similarly, studies of Z-linked gene expression in birds provide evidence for the gene-by-gene nature of Z dosage compensation, as measured by comparisons of gene expression levels between ZZ males and ZW females (Itoh et al. 2007; Mank and Ellegren 2009; Uebbing et al. 2015), and indicate a stronger signature of dosage compensation at predicted dosage-sensitive genes (Zimmer et al. 2016). By showing that such dosage sensitivities existed on the ancestral autosomes and consist of sensitivity to both increases and decreases, our findings highlight an additional aspect of dosage compensation that affects both birds and mammals.

    In addition to revealing similarities between mammals and birds, our study provides a view of dosage compensation that highlights post-transcriptional regulatory mechanisms, pointing to specific noncoding sequences with known mechanisms (miRNA target sites) functioning across evolutionary time. A recent study in birds showed a role for a Z-linked miRNA, miR-2954-3p, in dosage compensation of some Z-linked genes (Warnefors et al. 2017). Our study suggests an additional, broader role for miRNA targeting, with hundreds of different miRNAs acting to tune gene dosage both before and during sex chromosome evolution. Furthermore, our finding of greater conserved miRNA targeting of X-inactivated genes relative to X escape genes shows that it is possible to predict the acquisition of a transcriptional regulatory state (XCI) during sex chromosome evolution on the basis of a preexisting, post-transcriptional regulatory state. Perhaps additional post-transcriptional regulatory mechanisms and their associated regulatory elements will be shown to play roles in mammalian and avian dosage compensation.

    Recent work has revealed that the sex-specific chromosome—the Y in mammals and the W in birds—convergently retained dosage-sensitive genes with important regulatory functions (Bellott et al. 2014, 2017). Our study, by reconstructing the ancestral state of post-transcriptional regulation, provides direct evidence that such heterogeneity in dosage sensitivity existed on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes. This heterogeneity influenced both survival on the sex-specific chromosomes in mammals and birds and the evolution of XCI in mammals. Thus, two independent experiments of nature offer empirical evidence that modern-day amniote sex chromosomes were shaped, during evolution, by the properties of the ancestral autosomes from which they derive.

    Methods

    Statistics

    Details of all statistical tests (type of test, test statistic, and P-value) used in this article are provided in Supplemental Table S1.

    Human genic copy number variation

    To annotate gene deletions and duplications, we used data from the ExAC (ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/cnv/), which consists of autosomal genic duplications and deletions (both full and partial) called in 59,898 exomes (Ruderfer et al. 2016). Further details are provided in the Supplemental Methods in the section titled “Human genic copy number variation.” These gene assignments are provided in Supplemental Table S2.

    X- and Z-linked gene sets

    We utilized our previous reconstructions of the ancestral mammalian X (Bellott et al. 2014) and avian Z (Bellott et al. 2017) Chromosomes, as well as information on multicopy and ampliconic X-linked genes (Mueller et al. 2013) and XCI status in humans (Balaton et al. 2015) to delineate classes of X- and Z-linked genes. Further details are provided in Supplemental Methods under the sections titled “X-linked gene sets” and “Z-linked gene sets.” Information on X-linked genes is provided in Supplemental Table S3. Information on Z-linked genes is provided in Supplemental Table S4.

    miRNA target sites

    Precalculated PCT scores for all gene–miRNA family interactions (http://www.targetscan.org/vert_71/vert_71_data_download/Summary_Counts.all_predictions.txt.zip) and site-wise alignment information (http://www.targetscan.org/vert_71/vert_71_data_download/Conserved_Family_Info.txt.zip) were obtained from TargetScan Human v7. Details on the filtering of miRNAs and resampling-based assessment of PCT scores are provided in Supplemental Methods in the section titled “microRNA target site PCT scores.” Details regarding analysis of human–chicken or human–Anolis conserved sites, as well as approaches to control for background conservation, are provided in Supplemental Methods in the section titled “Human-chicken conserved microRNA target sites.”

    Variation in within-UTR conservation bias

    To address the possibility that nonuniformity in regional 3′ UTR conservation could artificially inflate or deflate conservation scores of certain target sites, we implemented a step-detection algorithm to segment 3′ UTRs into regions of homogeneous background conservation and calculated miRNA site conservation relative to these smaller regions. These regionally normalized scores, corresponding to all gene–miRNA interactions, are provided in Supplemental Table S5. Details of the step-detection algorithm are provided in Supplemental Methods in the section titled “Variation in within-UTR conservation bias.”

    Logistic regression

    Logistic regression models were constructed using the function “multinom” in the R package “nnet” (Venables and Ripley 2002). We used previously published values for known factors in the survival of Y-linked (Bellott et al. 2014) and W-linked (Bellott et al. 2017) genes except for human expression breadth, which we recalculated using data from the GTEx Consortium v6 data release (The GTEx Consortium 2015). Briefly, kallisto was used to estimate transcript per million (TPM) values in the 10 male samples with the highest RNA integrity numbers (RINs) from each of 37 tissues, and expression breadth across tissues was calculated as described by Bellott et al. (2014), using median TPM values for each tissue.

    Assessing Z-linked dosage compensation using cross-species RNA-sequencing data

    Raw data from Marin et al. (2017) was obtained, and kallisto and limma/voom were used for abundance quantification and differential expression, respectively. Further details are provided in the Supplemental Methods in the section titled “Assessing Z-linked dosage compensation using cross-species RNA-sequencing data.”

    Gene expression profiling and crosslinking data sets

    Fold-changes in mRNA expression and targets of Argonaute as determined by high-throughput CLIP were obtained from a variety of publicly available data sets. Further details are provided in Supplemental Methods in the section titled “Gene expression profiling and crosslinking datasets.” All fold-changes and CLIP targets are provided in Supplemental Table S6.

    Software availability

    A custom Python (RRID:SCR_008394) script utilizing Biopython (RRID:SCR_007173) was used to generate shuffled miRNA family seed sequences. Identification of miRNA target site matches using shuffled seed sequences was performed using the “targetscan_70.pl” Perl script (http://www.targetscan.org/vert_71/vert_71_data_download/targetscan_70.zip). 3′ UTR segmentation was performed with the “plot_transitions.py” Python script. Code is available at https://github.com/snaqvi1990/Naqvi17-code and as Supplemental Code.

    Acknowledgments

    We thank V. Agarwal, S. Eichorn, S. McGeary, and D. Bartel for assistance with the TargetScan database and helpful discussions; A. Godfrey for updated human–chicken orthology information; and A. Godfrey, J. Hughes, and H. Skaletsky for critical reading of the manuscript. This work was supported by the National Institutes of Health and the Howard Hughes Medical Institute. S.N. was supported under a research grant by Biogen.

    Author contributions: S.N., D.W.B., and D.C.P. designed the study. S.N. performed analyses with assistance from D.W.B. K.S.L. developed and implemented the step-detection algorithm. S.N. and D.C.P. wrote the paper.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.230433.117.

    • Freely available online through the Genome Research Open Access option.

    • Received September 21, 2017.
    • Accepted February 6, 2018.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    References

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server