Pooled analysis of radiation hybrids identifies loci for growth and drug action in mammalian cells

  1. Desmond J. Smith1
  1. 1Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA;
  2. 2Office of Information Technology, UCLA, Los Angeles, California 90095-1557, USA;
  3. 3Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA;
  4. 4Howard Hughes Medical Institute, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
  • Corresponding author: DSmith{at}mednet.ucla.edu
  • Abstract

    Genetic screens in mammalian cells commonly focus on loss-of-function approaches. To evaluate the phenotypic consequences of extra gene copies, we used bulk segregant analysis (BSA) of radiation hybrid (RH) cells. We constructed six pools of RH cells, each consisting of ∼2500 independent clones, and placed the pools under selection in media with or without paclitaxel. Low pass sequencing identified 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci for mitochondrial abundance at genome-wide significance. Resolution was measured as ∼30 kb, close to single-gene. Divergent properties were displayed by the RH-BSA growth genes compared to those from loss-of-function screens, refuting the balance hypothesis. In addition, enhanced retention of human centromeres in the RH pools suggests a new approach to functional dissection of these chromosomal elements. Pooled analysis of RH cells showed high power and resolution and should be a useful addition to the mammalian genetic toolkit.

    A variety of experimental strategies have been developed over the last two decades to explore genotype–phenotype correlations in mammalian cells. Gene “knockdowns” use either interfering RNA (RNAi) or small interfering RNAs (siRNAs) to specifically degrade transcripts. These reagents have given insights into multiple phenotypes (Root et al. 2006; Blakely et al. 2011; Mohr et al. 2014; Boettcher and McManus 2015). More recently, CRISPR-Cas9 gene editing approaches that introduce targeted mutations into DNA have emerged as a powerful strategy to enable genome-wide screens for growth and pharmacological mechanisms (Koike-Yusa et al. 2014; Shalem et al. 2014; Hart et al. 2015; Wang et al. 2015; Liu et al. 2017; Pawluk 2018).

    These technologies have been fruitful but have some drawbacks. Potential off-target effects exist for both RNAi (Alagia and Eritja 2016; Bofill-De Ros and Gu 2016) and CRISPR-Cas9 (Mohr et al. 2016; Tycko et al. 2016). Furthermore, cell delivery requires the use of lentiviral libraries of unknown stability and high complexity. Assuming 20,000 protein coding genes, each knockout is represented by ∼500 cells in a 75-cm2 flask. Decreased power, or even missing data, are therefore risks. The addition of noncoding transcripts trebles the number of genes (Derrien et al. 2012; Kozomara and Griffiths-Jones 2014; Frankish et al. 2015; Liu et al. 2017), augmenting the problem.

    Overexpression provides complementary information to loss-of-function screens (Prelich 2012). For example, screens for viability in Saccharomyces cerevisiae revealed no overlap of overexpressed and knockout genes, suggesting that overexpression alters cell physiology in a distinct manner to loss-of-function (Sopko et al. 2006).

    Gain-of-function screens can be particularly revealing in cancer, where overexpression is a key disease driver (Vogelstein et al. 2013). Overexpression is also valuable when genes are haploinsufficient, ruling out the use of knockouts. While cDNA overexpression can be achieved with lentiviral libraries, genome-wide screens have yet to be performed in mammalian cells (Yang et al. 2011; Arnoldo et al. 2014). In addition, cDNA screens typically omit novel isoforms, whose number exceeds that of known transcripts (Škalamera et al. 2011; Eksi et al. 2013; Li et al. 2014), as well as noncoding genes (Derrien et al. 2012; Kozomara and Griffiths-Jones 2014; Frankish et al. 2015; Liu et al. 2017).

    Gene expression can be manipulated in a targeted fashion using CRISPR interference (CRISPRi) or CRISPR activation (CRISPRa), in which a catalytically dead Cas9 (dCas9) protein is fused with a transcriptional repressor or activator domain, respectively (Gilbert et al. 2014; Kampmann 2018). The approach has been used in a number of cellular screens, but the overexpression is strongly constitutive with no tissue specific regulation.

    As a new strategy to dissecting genotype/phenotype relationships in mammalian cells, we have been retooling radiation hybrid (RH) technology (Park et al. 2008; Ahn et al. 2009; Lin et al. 2010; Ranola et al. 2010; Wang et al. 2011; Khan et al. 2016). This venerable approach to genetic mapping employs lethal doses of radiation to break the genome of a human cell line into small fragments, which are then transferred to living hamster cells by cell fusion (Goss and Harris 1975; Cox et al. 1990; Walter et al. 1994). Human markers that are close together tend to be found in the same set of RH clones, while markers far apart tend to be inherited independently.

    A typical RH panel consists of ∼100 clones, each containing ∼4%–10% of the human genome as extra copies. The number of breakpoints in RH mapping is far larger than in meiosis, giving the approach close to single gene resolution and facilitating high resolution maps of mammalian genomes (McCarthy et al. 2000; Avner et al. 2001; Hudson et al. 2001; Olivier et al. 2001; Kwitek et al. 2004).

    Although RH panels were widely used in the genome projects, their potential for dissecting biological traits was not fully realized. Recently, we measured the growth of clones from a human RH panel in the presence of the antitubulin drugs paclitaxel (Taxol) and colchicine (Khan et al. 2016). We identified a single gene, ZNRF3, as important to the action of both agents. However, statistical power was limited by the modest size of the RH panel (79 clones).

    Bulk segregant analysis (BSA) is an efficient approach to enhancing statistical power in genetic mapping (Michelmore et al. 1991). A pool of diverse cells is placed under selection and loci that contribute to survival are revealed by enrichment of nearby genetic variants (Ehrenreich et al. 2010). The approach surveys a large population of cells using a small number of assays.

    Here, we describe experiments that combine RH mapping with BSA. We reasoned that this combination would provide three benefits: (1) a genome-wide approach to gene identification that uses extra dosage rather than loss-of-function; in this study, we sought genes for cell growth in the presence or absence of paclitaxel; (2) the ability to assess coding and noncoding genes on an equal footing; and (3) gains in both statistical power and the already impressive resolution of RH mapping by using multiple independent pools of RH cells, each with thousands of clones.

    Results

    Creating six RH pools

    The strategy for RH-BSA is shown in Figure 1. We lethally irradiated human HEK293 cells (TK1+) to fragment their DNA. The cells were then fused with living hamster A23 cells (TK1) and plated in HAT selective medium. The HEK293 cell line is an important resource for cell biology and biotechnology and its genome has been sequenced (Lin et al. 2014). In addition, a high-quality draft genome is available for hamster, with continuity approaching that of the mouse (Rupp et al. 2018).

    Figure 1.

    Creating the RH pools.

    After fusion, the surviving RH cells receive the selectable marker thymidine kinase (TK1) from the donor cells in addition to a random assortment of human DNA fragments. We maintained the RH cells as heterogeneous pools, rather than picking individual clones as is usually done for genetic mapping. The pools were genotyped by low-pass sequencing before and after competition due to growth or drug exposure. Human genes that confer a proliferative advantage become more abundant, while genes that confer a disadvantage become less abundant.

    We created six independent RH pools, each with a mean fragment size of 7.4 ± 0.1 Mb (Supplemental Fig. S1). Each pool had 2621 ± 441 clones, after taking into account a spontaneous revertant rate of 26% ± 2% (Supplemental Table S1).

    When the cells from the fusion reactions reached confluence, an aliquot from each of the pools was reserved for genotyping. These samples represent week 0 and 0 nM paclitaxel (Supplemental Table S2) and are referred to as the “RH pools.” The cells were further propagated as separate pools and genotyped after 1, 2, 3, 4, and 6 wk in HAT medium supplemented with 0, 8, 25, or 75 nM paclitaxel. There were 115 RH samples in total.

    Human DNA in the RH samples

    We used low-pass sequencing to analyze the RH samples under the various conditions of growth and drug (Fig. 1; Supplemental Tables S3–S6). A mean of 40.5 ± 0.7 million (M) reads was obtained for each sample, corresponding to a sequencing depth of 0.84 ± 0.01 times the human genome. Reads were aligned to the human and hamster genomes and cross-species reads discarded, leaving only species-specific alignments for subsequent analyses.

    The hamster genome in the six RH pools was not changed by the cell fusions (Fig. 2A,B; Supplemental Figs. S2, S3A). In contrast, a selection of human DNA fragments from the HEK293 cells was retained by each RH pool (Fig. 2C–F; Supplemental Figs. S3B, S4–S6).

    Figure 2.

    Genome copy number in the RH pools. (A) Hamster copy number deduced from reads in the A23 cells, normalized to a mean value of 1. (B) Hamster copy number averaged across the six RH pools. (C) Human copy number in the HEK293 cells. (D) Human DNA copy number (retention) averaged across the six RH pools. TK1 assigned retention of 1. (E) Increased retention of human centromere on Chromosome 2. (F) Increased retention of human centromere and TK1 on Chromosome 17.

    Since human TK1 was used as the selectable marker, this gene has an expected retention frequency of one in the hybrid cells. Donor centromeres also showed increased retention, as these chromosomal regions confer increased stability on the DNA fragments (Fig. 2D–F; Supplemental Figs. S5, S6; Wang et al. 2011). In contrast, noncentromeric regions showed relatively constant retention across the genome, as previously documented in RH panels (Fig. 2D–F; Supplemental Figs. S5, S6; McCarthy et al. 2000; Avner et al. 2001; Hudson et al. 2001; Olivier et al. 2001; Kwitek et al. 2004).

    The mean retention frequency in the RH pools was 3.9% ± 0.3% (Supplemental Table S7), with each gene harbored by 3.9 × 105 (± 3.4 × 104) cells in a 75-cm2 flask of 107 RH cells. This cell number compares favorably with the 500 cells expected for a protein-coding lentiviral library. The human genome is represented with 103 ± 20-fold redundancy in each of the six individual RH pools.

    The mapping resolution of RH-BSA depends on the length of the DNA fragments, their average retention frequency, and the number of clones. At the mean fragment size of 7.4 ± 0.1 Mb, the number of breakpoints in the individual RH pools is 8.7 × 104 (± 1.7 × 104) and is expected to yield a mapping resolution of 71 ± 14 kb for each pool and 12 ± 3 kb for all six pools together.

    Retention changes in the RH samples

    Human DNA is commonly lost during growth of hybrid cells, and this phenomenon has been exploited as a tool for gene mapping (Kucherlapati and Ruddle 1975; Harris 1993; Wasmuth 2001). In a pooled setting, alterations in human DNA levels can also occur as a result of changes in relative clone abundance.

    The overall mean retention of human DNA in the RH samples showed a significant decrease as a result of growth (3.9% ± 0.3%, week 0; 1.1% ± 0.2%, week 6), but not paclitaxel (Supplemental Figs. S7–S9). When profiled across the genome, the hamster genome was stable, while human DNA levels showed significant decreases together with increased variance (Fig. 3A–D; Supplemental Fig. S10).

    Figure 3.

    Copy number changes at week 4. (A) Hamster genome, growth at week 4 compared to week 0; 0 nM paclitaxel. (B) Human genome, growth at week 4 compared to week 0; 0 nM paclitaxel. (C) Hamster genome, 75 nM paclitaxel compared to 0 nM paclitaxel; week 4. (D) Human genome, 75 nM paclitaxel compared to 0 nM paclitaxel; week 4. Changes on log2 scale averaged across the six RH pools.

    Growth loci

    Growth loci tally

    Human growth loci were identified by employing a negative binomial mixed model to detect significant changes in read counts across the genome as a result of growth time at the various paclitaxel concentrations (Supplemental Figs. S11–S17). Cross-validation procedures confirmed high levels of reproducibility (Supplemental Fig. S18). In total, there were 859 unique growth loci, ∼1.4% of all known genes (Fig. 4A; Supplemental Figs. S19–S21; Supplemental Table S8).

    Figure 4.

    Loci for average conditional effects of growth and paclitaxel. (A) Significance values for growth. (B) Significance values for paclitaxel. Red dotted line, permutation significance threshold.

    Coding and noncoding growth loci

    Four of the growth loci mapped to centromeres. Of the remaining loci, the nearest genes were coding for 442 (52%) and noncoding for 413 (48%). Although the RH coding growth genes were significantly enriched compared to noncoding, noncoding genes clearly play a substantial role in cell proliferation. Detailed views of growth loci are shown in Figure 5A–F and Supplemental Figure S22.

    Figure 5.

    Close up views of growth loci. (A) Chromosome 1. (B) Chromosome 3. (C) Chromosome 16. (D) Sequence read changes on log2 scale for six significant growth loci. Colored lines, best fit. Gray bands, 95% confidence intervals. (E) Locus for CTTNBP2 on Chromosome 7. (F) Locus for MCTP2 on Chromosome 15. Horizontal red and black lines, permutation significance thresholds.

    Nearly all growth loci show DNA loss

    Consistent with the decrease of human DNA in the RH samples (Fig. 3B,D; Supplemental Figs. S7–S10), only seven of the 859 growth loci, including SEMA3A and TOR1A, had positive coefficients (Supplemental Fig. S23A). Growth genes thus showed preferential loss rather than gain. Our observation echoes the finding that inactivation of tumor suppressor genes is more common in cancer than oncogene activation (Harris 1993; Vogelstein et al. 2013), although the differences between RH and tumor cells mandate caution when making this analogy.

    RH growth genes and complex human disease

    The RH growth genes were significantly enriched in the catalog of human genome-wide association studies (GWAS; false discovery rate [FDR] = 5.8 × 10–12) (Chen et al. 2013) and also significantly overrepresented in 16 of 203 disease-related terms in the NCBI Database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/gap/; FDR < 0.05) (Chen et al. 2013; Tryka et al. 2014). In contrast, there was no significant enrichment for common human disease genes in loss-of-function CRISPR screens (FDR > 0.76, GWAS; FDR > 0.25, dbGaP) (Hart et al. 2015; Wang et al. 2015).

    Most common human disease variants represent polymorphisms in regulatory regions, which have more subtle effects than null alleles (Gallagher and Chen-Plotkin 2018). The enrichment of RH growth genes in human disease may thus be related to the restrained effects of an extra copy compared to a knockout. The RH growth genes may be good candidates for causative genes in complex human disease.

    The balance hypothesis

    The “balance” hypothesis postulates that the effects of altered gene dosage occur as a result of unequal synthesis of gene products (Papp et al. 2003). For example, mismatched stoichiometry could lead to decreased efficiency of multi-subunit protein complexes, such as many enzymes. The hypothesis yields two predictions: (1) genes that show a phenotype when dosage is increased should also show a phenotype when dosage is decreased; and (2) genes with dosage effects should be more likely to participate in multi-subunit complexes. The balance hypothesis has been falsified in yeast (Sopko et al. 2006), but whether it is true in mammalian cells is unknown.

    We tested the first prediction by asking whether there was increased overlap between the growth genes identified using RH-BSA and those identified in loss-of-function CRISPR (Hart et al. 2015; Wang et al. 2015) and CRISPRi screens (Gilbert et al. 2014). Contrary to expectations, the RH growth genes had significantly decreased effect scores, decreased cell line hit rates, and significant underrepresentation among the CRISPR growth genes (Fig. 6A–F; Supplemental Figs. S24, S25). These observations refute the first prediction of the balance hypothesis.

    Figure 6.

    RH growth genes. (A) RH growth genes (RH+) have weaker CRISPR effects (Hart et al. 2015) than nongrowth RH genes (RH−). (BF) Bayes factor for growth effects in each cell line, P-values shown above comparisons. (B) Mean number of cell lines with a CRISPR hit is lower for RH growth genes. (C) Lack of overlap between RH and CRISPR growth genes in HCT116 cells. Abscissa shows threshold BF used to calculate overlap. (D) CRISPR scores (CS) for growth effects (Wang et al. 2015) multiplied by −1, so that higher scores mean stronger effects. (E) Lack of overlap between RH and CRISPR growth genes in KBM7 cells. (F) RH growth genes show weaker CRISPRi, but not CRISPRa, effects (−γ) in K562 cells (Gilbert et al. 2014). (G) RH growth genes have increased numbers of low-confidence protein–protein interactions and decreased numbers of high-confidence interactions. Score threshold (abscissa) is a measure of confidence. (H) Number of protein–protein interactions at most significant low-confidence score threshold (160) and most significant high-confidence threshold (995). (I) RH coding region growth genes have decreased expression compared to nongrowth genes in microarray data from a human RH panel. (J) Coding region (cr) RH growth genes have decreased expression compared to cr nongrowth genes in GTEx RNA-seq data from substantia nigra. (K) Noncoding (nc) RH growth and nongrowth genes show no significant expression differences in substantia nigra. (L) Decreased number of human paralogs (duplicates) for RH growth genes. P-values, Welch's two-sample t-test except in C and E, Fisher's exact test. (FDR) False discovery rate (Benjamini and Hochberg 1995). Transcripts per million (TPM) thresholded at ≥5.

    To test the second prediction, we asked whether the RH growth genes had increased likelihood of protein–protein interactions, similar to the CRISPR essential genes (Hart et al. 2015; Wang et al. 2015). Using the STRING v11 database (Szklarczyk et al. 2019), the RH growth genes showed decreased numbers of high-confidence protein–protein interactions and a corresponding increase in low-confidence interactions (Fig. 6G,H). Further, the RH growth genes showed no significant enrichment in protein interactions, molecular pathways, or multi-subunit protein complexes using Enrichr or DAVID (FDR > 0.05) (Huang et al. 2009; Chen et al. 2013). These findings contradict the second prediction of the balance hypothesis.

    Based on our results and those in yeast (Sopko et al. 2006), it appears that gene dosage alterations do not exert their effects through unbalanced synthesis of gene products, even in evolutionarily distant cells. Other mechanisms are likely responsible for gene dosage effects.

    Further divergent properties of RH-BSA and loss-of-function growth genes

    There were further divergent properties between the RH and CRISPR growth genes, in addition to those discussed in the previous section. Essential coding region genes identified in loss-of-function CRISPR screens showed increased expression (Hart et al. 2015; Wang et al. 2015). In contrast, the RH-BSA growth genes had the opposite behavior with decreased expression of coding, but not noncoding, genes in both a microarray data set from a human RH panel (Fig. 6I; Wang et al. 2011) and in RNA-seq data from human tissues (Fig. 6J,K; Supplemental Fig. S26; The GTEx Consortium 2015). The decreased expression of the RH growth genes may be the result of evolutionary selection for this property in genes with high dosage sensitivity.

    CRISPR growth genes were less likely to have human paralogs because duplicated genes offer functional redundancy (Wang et al. 2015). The RH growth genes also had fewer duplicates but perhaps for a different reason; additional copies of the RH growth genes could result in adverse phenotypes, as these genes inhibit proliferation when present in an extra dose (Fig. 6L).

    Similar properties of RH-BSA and loss-of-function growth genes

    In other respects, the RH and CRISPR growth genes had comparable properties. Both sets of genes were intolerant of loss-of-function mutations, suggesting similar functional importance (Supplemental Fig. S27A; Wang et al. 2015). In addition, the RH and CRISPR growth genes showed decreased human/mouse sequence divergence, indicating evolutionarily conserved roles (Supplemental Fig. S27B; Hart et al. 2015; Wang et al. 2015). The RH and CRISPR growth genes also showed increased numbers of orthologs across species (“phyletic retention”), further suggesting evolutionary conservation of these two gene classes (Supplemental Fig. S27C; Wang et al. 2015).

    The CRISPR screens for essential genes were restricted to coding genes and showed longer transcripts, increased exon numbers, and longer coding sequences (Hart et al. 2015; Wang et al. 2015). The coding and noncoding RH growth genes also showed increased gene length, transcript size, and exon numbers, as well as longer open reading frames for coding genes (Supplemental Fig. S27D–J).

    Receiver operating characteristic (ROC) curves showed that gene length was the best predictor of RH growth genes (area under the curve [AUC] = 0.76 and 0.70 for coding and noncoding genes, respectively), followed by CRISPR essentiality scores, protein interactions, and RH expression levels (Supplemental Fig. S27K). The ROCs were significantly better predictors than random (P < 4.9 × 10–4, Wilcoxon rank sum test).

    The RH growth genes showed no significant increase in effect scores, or overlap, with growth genes from a CRISPRa screen (Fig. 6F; Supplemental Fig. S25F; Gilbert et al. 2014). The constitutive overexpression of the CRISPRa technology may be responsible for the discrepancy.

    Novel RH growth genes

    The coding region RH growth genes were relatively poorly studied, having fewer literature citations in the GeneRIF and Reactome databases (Supplemental Fig. S28). Of the 442 coding region RH growth genes, 148 had neither literature citations in GeneRIF nor entries in the Reactome database. Study of these little understood genes may provide insights into mammalian cell growth.

    Unconventional RH growth loci included six that mapped to olfactory gene clusters (Supplemental Fig. S29A) and two that mapped to gene deserts, with the nearest gene >250 kb away (Supplemental Fig. S29B,C).

    Paclitaxel loci

    Paclitaxel loci were identified by detecting significant changes in human reads as a result of rising drug concentration at the various growth times. The number of loci increased significantly with longer growth periods, suggesting that the various exposure times recruit different genes (Fig. 4B; Supplemental Figs. S20, S21B, S30; Supplemental Tables S9–S11). This observation is consistent with our previous study using a panel of human RH clones (Khan et al. 2016), as well as findings in yeast (Wang and Kruglyak 2014). There were 38 unique paclitaxel loci in total.

    Compared to the growth loci, a significantly greater proportion of paclitaxel loci (seven out of 38) had positive coefficients indicating increased representation as a result of drug exposure (Supplemental Fig. S23B).

    Of the 34 noncentromeric paclitaxel loci, the nearest genes were coding for 19 (56%) and noncoding for 15 (44%), not significantly different from random. Coding and noncoding genes make important contributions to the action of paclitaxel as well as growth.

    Close-up views of significant paclitaxel loci are shown in Supplemental Figure S31. Consistent with the activity of paclitaxel as an antitubulin drug, there were two genes among the paclitaxel loci that are known to participate in microtubule function; NEK10 (Porpora et al. 2018; Yi et al. 2018) and PDE4DIP (Bouguenina et al. 2017; Yang et al. 2017). The role of PDE4DIP may explain the clinical utility of phosphodiesterase 4 inhibitors as antiproliferative and immunosuppressive agents (Keating 2017).

    There were 10 genes in common between the 38 paclitaxel loci and 859 growth loci, including LSAMP, SEMA3D, and GATAD2A (Supplemental Fig. S32). The dual role of these genes in growth and microtubule function suggests that they may be attractive targets for the development of antiproliferative drugs.

    A number of the paclitaxel loci mapped to novel genes, suggesting fresh entry points for understanding the mechanisms of action of this drug.

    Interaction loci

    A total of 62 human interaction loci were identified based on read changes due to growth depending significantly on paclitaxel concentration and vice versa. Of the 57 noncentromeric interaction loci, the nearest genes were coding for 26 (46%) and noncoding for 31 (54%), not significantly different from random. Coding and noncoding genes make important contributions to the action of the interaction genes.

    Of the interaction loci, 15 overlapped with the 859 growth loci and 14 overlapped with the 38 paclitaxel loci (Supplemental Figs. S21C, S23C, S32–S34; Supplemental Tables S9, S10). The interaction loci included genes previously demonstrated to coregulate growth and tubulin function. For example, HGF promotes microtubule assembly and shows complex cell growth effects when combined with paclitaxel (Dugina et al. 1995; Rasola et al. 2004; Tian et al. 2014; Ying et al. 2015). Similarly, FSIP1 regulates growth of triple-negative breast cancer cells and resistance to docetaxel, a drug in the same class as paclitaxel (Liu et al. 2018). SIKE1 (Sonnenschein et al. 2018) and CPT1A (Li et al. 2013) also regulated both cell growth and microtubule activity. The interaction genes may thus provide useful clues to the relationship between growth inhibition and paclitaxel action.

    Empirical evaluation of mapping resolution using RH-BSA

    The −2log10P width of the RH loci (Fig. 5; Supplemental Figs. S31, S34) suggested a mapping resolution of <100 kb. To obtain an empirical estimate of mapping accuracy, we used the increased retention frequency of TK1 and the centromeres as a proxy for locus identification (Fig. 2; Supplemental Figs. S5, S6, S35). The mapping accuracy estimated using both approaches was similar, together suggesting a resolution ∼30 kb. This mapping accuracy was consistent with that expected from the number of breakpoints in the RH pools.

    Despite its high mapping resolution, the RH strategy could not always resolve neighboring loci, particularly in areas of high gene density (Supplemental Fig. S36). In the future, increased radiation doses and pool numbers will help resolve these complex regions.

    Mitochondrial copy number

    Mitochondria in HEK293 and A23 cells

    The sequence data revealed a mitochondrial copy number of 2145 for the donor HEK293 cells and 843 for the recipient A23 cells (Supplemental Table S4), similar to published values for mammalian mitochondria (Phillips et al. 2014; van Gisbergen et al. 2015; Reznik et al. 2016; Wachsmuth et al. 2016).

    Human loci regulating hamster mitochondrial abundance

    The hamster mitochondrial copy number in the six RH pools was 757 ± 63, not significantly different from A23 cells (t[1,5] = 0.51, P = 0.63; two-sample t-test) (Supplemental Fig. S37A,B; Supplemental Table S12).

    We identified three human loci that significantly regulated hamster mitochondrial reads in the RH samples, DARS2, GORAB, and GRID2 (Supplemental Fig. S38). Increased dosage of all three loci was associated with higher hamster mitochondrial copy number (Supplemental Fig. S38A–D). DARS2 encodes human mitochondrial aspartyl-tRNA synthetase (Scheper et al. 2007), while GRID2 encodes the glutamate ionotropic receptor delta type subunit 2 which has previously been shown to promote mitochondrial fission (Liu and Shio 2008).

    Human loci regulating human mitochondrial abundance

    Human mitochondria were only present at low copy number (1.0 ± 0.4, mean copy number all RH samples; t[1,5.3] = 2.7, P = 0.04, Kenward-Roger degrees of freedom; average conditional effects), suggesting that these organelles are either donated inefficiently or replicate poorly in hamster cells (Supplemental Fig. S37C–H; Supplemental Table S12).

    Consistent with the low levels of human mitochondria, we found no permutation significant human loci that regulated the read numbers of this organelle. However, one locus on Chromosome 3 surpassed the less stringent FDR threshold of 0.05 (−log10P = 8.6, FDR = 8.1 × 10–4) (Supplemental Fig. S38E,F). This locus mapped to CAPN7 and was 14.9 kb from SH3BP5, whose product is localized to the outer mitochondrial membrane and is involved in regulation of apoptosis (Win et al. 2018). Nevertheless, the locus should be viewed with considerable caution; deeper sequencing may provide stronger evidence for loci that regulate human mitochondrial levels.

    Hamster loci

    Significant hamster loci due to pre-existing copy number alterations (CNAs) (Fig. 2A,B; Supplemental Fig. S2) were identified for growth, paclitaxel, and their interaction (Supplemental Figs. S39, S40).

    The hamster loci had weaker −log10P values than the human, consistent with the greater stability of the hamster genome (Fig. 3A–D; Supplemental Fig. S10). In addition, hamster CNAs and their corresponding loci ranged over megabases which, combined with random fluctuations in −log10P values, precluded gene identification (Supplemental Fig. S41).

    Mycoplasma contamination

    The sequence data revealed appreciable levels of mycoplasma contamination in the A23 parental cells, as well as in the RH samples (Supplemental Table S13). However, including the Mycoplasma fermentans reads in our statistical model left the growth and paclitaxel loci essentially unchanged. In addition, no significant human loci were found that regulated the levels of the bacterium (all FDR = 1). Together, these results indicate that the mycoplasma contamination had little effect on our study.

    Discussion

    We implemented RH-BSA by creating six independent pools of RH cells, each with ∼2500 clones. The RH pools were placed under growth selection in either normal medium or medium supplemented with paclitaxel. We used low-pass sequencing to identify human genes with significant read changes as a result of growth or drug treatment.

    The mean length of the human DNA fragments was ∼7 Mb, and the number of breakpoints in each pool was ∼105. The cells contained an average of ∼4% of the human genome as extra copies, with each pool carrying the human genome with ∼100-fold redundancy.

    Unlike current screening technologies, where each cell carries one altered gene, each cell in RH-BSA harbors multiple genes. This extra layer of multiplexing helps assure equal and robust representation of the human genome in the pooled analysis.

    Significant loci were identified using permutation, with a null expectation for each genome-scan of 0.05 false positives, or 0.7 loci from 13 genome scans. We found 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci regulating mitochondrial levels, far exceeding expectations and suggesting a low false positive rate. Cross-validation further demonstrated the high reliability of the approach.

    We used the increased retention of TK1 and the centromeres to measure the mapping resolution of RH-BSA as ∼30 kb, close to single gene. This resolution was consistent with that expected from the breakpoint number. Unlike association mapping (Edwards et al. 2013), the resolution of RH-BSA is not limited by recombination hotspots and linkage disequilibrium. In fact, the resolution and power of RH-BSA can be further increased by using higher doses of radiation and more pools. Tenfold enhancements are readily attainable.

    While the measured resolution was frequently sufficient for single gene identification, gene-rich regions could not be clearly resolved. Nevertheless, a number of conclusions could be drawn from our study.

    The RH-BSA growth genes showed significant nonoverlap with genes identified from loss-of-function screens, as well as decreased participation in multi-subunit complexes. These findings repudiated the balance hypothesis, agreeing with findings from yeast. Altered gene dosage may thus exert its effects via absolute, rather than relative, expression levels.

    As well as their divergent properties, the RH-BSA and loss-of-function growth genes also shared attributes. In particular, both sets of genes showed higher evolutionary conservation and increased gene lengths.

    The RH growth genes were overrepresented in the GWAS catalog and dbGaP. Complex trait variants are often found in regulatory regions, exerting their effects through altered gene expression. The extra human gene copies in the RH pools may simulate these mild allelic effects and hence represent attractive candidate genes for complex traits. In addition, both coding and noncoding RH genes made substantial contributions to cell growth, paclitaxel action, and their interaction.

    Three significant human loci that regulated hamster mitochondrial abundance were identified, two of which had known functions in mitochondria. Further study of these genes may provide useful therapeutic insights into mitochondrial disorders.

    RH-BSA may also allow better mapping of functional centromere elements based on increased retention. The X and Y centromeres have been characterized in their entirety using long-read sequencing, with the α-satellite repeats necessary for centromere function extending over 2.8 Mb and 365 kb, respectively (Jain et al. 2018; Miga et al. 2020). The long reads were placed using landmark patterns of specific variants in the satellite arrays, with a frequency of ∼1 per 50 kb. Although the mapping resolution of RH-BSA is ∼30 kb, the average lengths of individual human fragments in the RH cells is ∼7 Mb. These fragments will encompass numerous α-satellite landmarks, if not entire arrays, allowing the reconstruction of centromeres for functional mapping by long-read sequencing.

    Centromeric α-satellites bind to CENPA, a histone H3 variant essential for kinetochore formation. Further, α–satellite variants (epialleles) can affect CENPA binding (Aldrup-MacDonald et al. 2016). An alternative strategy for mapping functional centromeric elements would therefore employ a combination of chromatin immunoprecipitation (ChIP) and immunolabeling-fluorescent in situ hybridization (immunoFISH) to assay CENPA-associated DNA in the donor human cells and RH pools. Both long-read sequencing and immunoFISH will allow individual centromere homologs to be examined.

    Changes in human DNA abundance in the RH pools could be due either to DNA loss or altered clone representation. Pooled analysis cannot distinguish between these possibilities. However, the majority of the growth and paclitaxel loci showed decreased read counts, which is consistent with the known tendency of RH cells to lose human DNA fragments. Thus, DNA loss may be the predominant mechanism of copy number change in the RH pools. Single cell analysis can discriminate unequivocally between the two alternatives.

    RH cells are a unique tool. These cells are not entirely normal. Their genome does not exist in nature, representing hybrids of two closely related mammalian species, human and hamster. They are also aneuploid, with ∼4% of the genome in the triploid state. Nevertheless, the cells grow and divide and contain all the standard features found in mammalian cells (e.g., cytoskeleton, mitosis, etc.). For a small number of genes, the evaluated phenotypes may be due to species differences, but such occurrences will be infrequent and will still be valuable in highlighting phenotypic mechanisms.

    The parental cells used to create the RH pools are immortalized and, in the case of HEK293, also tumorigenic. Pre-existing mutations in these cells may influence the effects of copy number changes across the genome, imposing limitations on the relevance of our findings to cell types such as primary cells.

    Such constraints can be overcome by exploring the utility of other cells in the fusion process (Hiratsuka et al. 2015; Liskovykh et al. 2016; Suzuki et al. 2016). A variety of species, including human, can be evaluated for both donor and recipient cells. In addition, selectable markers such as neomycin resistance can be used instead of TK1, alleviating concerns about the perturbation of nucleotide synthesis pathways by HAT medium (Aoki et al. 2014).

    Although the mycoplasma infection did not appear to affect the gene mapping results, follow-up investigations of gene function could be influenced by this contamination. The hamster cells can be cured of mycoplasma using well-established antibiotic regimens (Uphoff et al. 2012). Known mycoplasma-free cells can also be employed as the recipients.

    We previously identified ZNRF3, a regulator of Wnt signaling, as a target for paclitaxel action using individual clones from the human G3 RH panel (Khan et al. 2016). This locus was not replicated using the RH pools. In our original study, the RH clones were weaned from HAT, unlike the RH-BSA experiments. In addition, the ability to detect growth factor signaling pathways is likely to be attenuated in a pooled setting. These differences may explain the lack of replication.

    The RH-BSA approach can be harnessed to uncover genes for any phenotype where a convenient selection can be devised (Miyake et al. 2019). The approach is complementary to loss-of-function screens and provides information on both coding and noncoding genes. RH-BSA should be a valuable addition to the suite of tools for mammalian genetics.

    Methods

    Sequencing

    Libraries were prepared using the Illumina TruSeq Nano DNA library prep kit employing 2 μg of DNA from each RH sample. We performed the sequencing using 1 × 64-bp single reads on an Illumina HiSeq 2500 machine in rapid run mode with OLB (off-line basecaller) software and on-board cluster generation.

    Reads were aligned to the indexed GRCh38/hg38 human genome assembly (hg38.fa) or the Chinese hamster (Cricetulus griseus) genome assembly (RAZU01) (Rupp et al. 2018) using BWA-0.7.12 mapping software (Li and Durbin 2009). The reference genomes were downloaded from the UCSC Genome Browser prior to use (https://genome.ucsc.edu) (Kent et al. 2002), unzipped, and relevant files concatenated and indexed using BWA. Sequence reads were aligned to the human and hamster genomes at high stringency, allowing only one mismatch. Aligned reads were then cross-aligned to the other species and reads that aligned to both species discarded, effectively discriminating between human or hamster sequences.

    Aligned sequences were converted to SAM format and sorted and indexed using SAMtools-1.2 software (Li et al. 2009). To read and view sequence information, SAM files were converted to BAM files employing SAMtools. Indexed sorted BAM files were converted to BED files using BEDTools-2.15.0 (Quinlan and Hall 2010).

    Read quantitation

    The human and hamster nuclear reference genomes were indexed into 1-Mb windows using BEDTools, with 10-kb steps for both human and for hamster. BED files were then mapped against the indexed reference genome and the number of reads per 1-Mb window ascertained. We chose the 1-Mb window size as being small enough to reflect the mapping accuracy of RH-BSA but large enough to keep the sampling variance within bounds. Similarly, the 10-kb step size was sufficiently small to accurately delineate the mapped loci, while keeping the computing load manageable. The reads per nonoverlapping 1-Mb window in the 115 RH samples were 237 ± 12 for human and 13,798 ± 234 for hamster.

    Data access

    The sequencing data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA592253. Other data, tables, and computer scripts are available as Supplemental Code and from NIH figshare (https://nih.figshare.com/; http://dx.doi.org/10.35092/yhjc.11370204).

    Competing interest statement

    The authors declare no competing interests.

    Acknowledgments

    We thank the UCLA Semel Institute Neurosciences Genomics Core for sequencing. This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by the UCLA Institute for Digital Research and Education Research Technology Group. We thank Jake Lusis for helpful comments on the manuscript. This study was supported by the National Institutes of Health (R21 HG007405), the Bennet Mills Allen Trust, the Edna Jones Endowment, and the Department of Molecular and Medical Pharmacology, UCLA (to D.J.S.).

    Author contributions: A.H.K.: acquisition of data, analysis and interpretation of data, drafting or revising the article. A.L.: analysis and interpretation of data. R.T.W.: analysis and interpretation of data. J.S.B.: analysis and interpretation of data. K.L.: analysis and interpretation of data. D.J.S.: conception and design, analysis and interpretation of data, drafting or revising the article.

    Footnotes

    • Received February 10, 2020.
    • Accepted August 26, 2020.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents

    Preprint Server