Evaluating recovery potential of the northern white rhinoceros from cryopreserved somatic cells

The critically endangered northern white rhinoceros is believed to be extinct in the wild, with the recent death of the last male leaving only two remaining individuals in captivity. Its extinction would appear inevitable, but the development of advanced cell and reproductive technologies such as cloning by nuclear transfer and the artificial production of gametes via stem cells differentiation offer a second chance for its survival. In this work, we analyzed genome-wide levels of genetic diversity, inbreeding, population history, and demography of the white rhinoceros sequenced from cryopreserved somatic cells, with the goal of informing how genetically valuable individuals could be used in future efforts toward the genetic rescue of the northern white rhinoceros. We present the first sequenced genomes of the northern white rhinoceros, which show relatively high levels of heterozygosity and an average genetic divergence of 0.1% compared with the southern subspecies. The two white rhinoceros subspecies appear to be closely related, with low genetic admixture and a divergent time <80,000 yr ago. Inbreeding, as measured by runs of homozygosity, appears slightly higher in the southern than the northern white rhinoceros. This work demonstrates the value of the northern white rhinoceros cryopreserved genetic material as a potential gene pool for saving this subspecies from extinction.

1 Northern white rhinoceros pedigree Figure 1: NWR pedigree highlighting individuals sequenced in this study (in blue box) presumably unrelated, with name, studbook number, ID number, and ploidy number

Genetic Divergence
Pair-wise genetic divergence was estimated between all pairs of individuals using sites callable among all individuals, and defined as (2*homs+hets)/(2*callable fraction of genome), as defined in (Prado-Martinez et al., 2013). Divergence values for all individuals are shown in Supplementary Material Table 1. Calculations were performed on the full set of 9.4 million SNPs.

Shared SNP Polymorphism
In order to calculate shared polymorphism between the NWR and SWR, we took the average polymorphism of all possible combinations of the nine NWR     : 10-fold cross-validation validation as performed in ADMIXTURE, using K values from 1 to 5 and a dataset of approximately 144,000 SNPs. Yaxis is the cross-validation error, and the x-axis is the K value.

Mitochondrial Tree
The final mitochondrial alignment included nine northern and five southern white rhinoceroses, four of the southern sequences obtained from whole genome sequencing and one from Genbank (accession number NC 001808). The control region was excluded from the alignment. Phylogenetic analyses in BEAST 1.6.1 (Drummond and Rambaut, 2007) were performed considering a single partition with a model of sequence evolution corresponding to HKY + G + I, and five partitions as follow: tRNAs, rRNAs, and first, second and third codon sites of the protein coding genes. jModelTest 0.1 (Posada 2008) was used to select models of sequence evolution according to the Akaike Information Criterion: GTR + I (first and third codons, and rRNAs) and TrN (second codon, tRNAs). The monophyly of southern and northern white rhinoceroses was constrained according to a tree inferred using MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003).
The Bayesian inference consisted of two concurrent runs with four Markov chains (one cold and seven heated chains with a temperature of 0.2), twenty million generations (sampled every 1,000 generations), and a 10% burn-in. We verified that potential scale reduction factors were near to 1.0 for all parameters, and that the average standard deviation of split frequencies was below 0.01. We visualized convergence of runs to stationarity using Tracer version 1.6 (Drummond Our species tree inference estimated the mitochondrial divergence time around 720 kya for the two rhino subspecies. As noted above, estimates from both ∂a∂i and PSMC suggest that these two subspecies diverged less than 80 kya. This large difference in divergence times could at partially explained by the fact that both ∂a∂i and PSMC estimate population divergence, and our estimates from the mitochondrial data are for the most recent common ancestor of the two mitochondrial haplotypes, which must occur later than the time of population divergence.  For the ∂a∂i analysis, we used 8.7 million SNPs callable in all four southern white rhinoceroses and four of the northern white rhinoceroses. We used the folded frequency spectrum, which considers only minor allele frequencies. We fit to the data a series of increasing complexity models. One model included a split into two populations, followed by exponential growth. The second set of models constrained the northern and southern population sizes to fractions of the ancestral population size, followed by either exponential growth or a growth model similar to that used by (Gutenkunst et al., 2009). Results are presented in Supplemental Table 3. We include both the estimates scaled to θ, as reported by ∂a∂i, as well as estimates in natural units when appropriate. Table 3: Results from the three tested ∂a∂i models. Tsplit represents estimated split from ancestral population; Na is ancestral population size; nu1 and n1 the is size of the NWR and SWR populations at the time of the split from the ancestral population; for the fractional models, s is the fraction of the ancestral population which becomes the NWR at the split, mNS and mSN is the northsouth and south-north migration rate.

Inbreeding
We calculated the number of regions that could be considered a run of homozygosity (ROH), which is considered a good measure of inbreeding (McQuillan et al., 2008). We choose a window size of 1 Mbp according to (Pemberton et al., 2012), which identified regions of homozygosity smaller than 0.5 Mbp as the result of background relatedness, and regions larger than 1.6 Mbp as the result of recent parental relatedness.
We determined shared runs of homozygosity by calculating ROH shared by two or more rhinoceroses in each population. To compare the NWR to the SWR, we resampled all possible four rhinoceroses combinations from the nine NWR, and determined how often a ROH was shared by two or more rhinoceroses in a 4 individual sample Supplementary Material Figure 5. Runs of homozygosity shared by two or more individuals in each population, for lengths of 1, 5, 10, 15, 20, 25, and 30 Mbp. Error bars represent the standard deviation from resampling all possible four rhinoceroses combinations in the NWR.  with homologs of genes found in the horse X chromosome. These scaffolds represent 17% of the total size of the rhino genome. Due to difficulties in identifying X chromosome genomic regions in the rhino genomes, we did not filter or exclude the scaffolds identified as X chromosome in any of the genomic analyses.
These results are shown in Table 5. Table 5: Scaffolds in the southern white rhino genome with homologs on the horse X chromosome. The table contains the scaffold ID, the size of the scaffold, and the number of genes associated with the horse X chromosome found on the scaffold