Identifying Conserved Cell-Type-Specific Transcriptional Programs in Hydra and Clytia

This document covers our comparative analysis of the Hydra and Clytia single-cell atlases. This analysis entailed using reciprocal principal component analysis to align the two datasets, quantifying overall transcriptional similarities between cell types using mutual nearest neighbor analysis, and identifying orthologs with conserved expression patterns.

 

Aligning the Clytia and Hydra Single-Cell Atlases

The process of aligning single cell datasets from different species uses the same approach as correcting for inter-sample batch effects. Essentially, we simply treat inter-species differences as (dramatic) batch effects. For this to be possible, the two datasets need to use the same gene IDs. In order to unambiguously link orthologous genes between Hydra and Clytia, we limited our alignment analysis to only genes that had a one-to-one orthology between the two species, as determined through our Orthofinder analysis. We then simply converted the gene IDs for the Clytia data to their Hydra equivalents.

In the following code chunk, we identify one-to-one ortholog pairs, import and perform initial filtering on the Clytia and Hydra gene expression matrices, convert the Clytia gene IDs to their Hydra equivalents, and concatenate all of the separate samples (15 Hydra libraries and one Clytia library) into a list of Seurat objects.

(snippet from 01_alignment/aepCl.R)

We then performed standard dataset integration using reciprocal PCA analysis as implemented through Seurat. Following integration we performed PCA on the aligned dataset.

(snippet from 01_alignment/aepCl.R)

integratedElbow

We then performed Louvain clustering and UMAP dimensional reduction using 30 PCs. We then visualized the clustering results using the UMAP plot.

(snippet from 01_alignment/aepCl.R)

initIntUMAP

We also split the plot by species to see the contribution of each species to different cell populations

(snippet from 01_alignment/aepCl.R)

initUmapBySpec

(snippet from 01_alignment/aepCl.R)

initIntUMAPsplitSpec

 

To determine if our integrated dataset was properly aligning homologous cell types in the two species, we imported the cell annotations from the unintegrated atlases for each species. Specifically, we used the cell type annotations from our analysis of the AEP-mapped Hydra atlas (described in 05_hydraAtlasReMap.md) and the annotations from the original Clytia atlas publication.

(snippet from 01_alignment/aepCl.R)

intUMAPcuratedIDbroad

(snippet from 01_alignment/aepCl.R)

intUMAPcuratedIDbroad

(snippet from 01_alignment/aepCl.R)

intUMAPbroad

Based on these plots, our integration was successful, as homologous cell types colocalize in the UMAP and are clustered together by the Louvain algorithm.

Quantifying Transcriptional Similarities Between Clytia and Hydra Cell Types

Although UMAPs can give an indication of similarity between different cells, it is not a quantitative way of assessing similarity. To assess similarity quantitatively, we adopted the alignment score metric proposed by Tarashansky et al. (2021).

To generate this score, the 30 nearest inter-species neighbors for each cell in the aligned principal component space are identified. Then, for a given cell type in one species, the number of nearest cross-species neighbors that were from each cell type in the other species is tabulated. The final alignment score is defined as the portion of total cross-species pairs for a cell type in one species that belonged to a particular cell type in another species. Thus, a higher alignment score indicates that the two cell types being compared shared a higher number of neighbors in the aligned PC space.

Because we were interested in identifying possible similarities between neuronal subtypes in Clytia and Hydra, and because the whole-animal version of the Clytia atlas did not resolve individual neuronal subtypes, we incorporated cell type labels from the Clytia neuronal subclustering analysis before performing the alignment quantification.

To do this we first had to download the neuronal sub clustering object (available here) and convert it to a Seurat object.

(snippet from 01_alignment/aepCl.R)

clNeuroUMAP

We then updated the cell labels for Clytia neurons using the higher resolution cluster labels from the neuron sub-clustering

(snippet from 01_alignment/aepCl.R)

We then extracted the principal component cell scores from the integrated seurat object and separated them by species. We then performed a mutual nearest neighbor analysis to identify the Clytia cells that was most similar to each Hydra cell. We then converted these cell IDs to cell type labels and tabulated the number of cells from each Clytia cell type associated with each Hydra cell type. Finally, we used these values to generate a Sankey diagram showing all alignment scores greater than 0.05 (indicating that greater than 5% of all nearest neighbors for a Hydra cell type were made up of a given Clytia cell type).

(snippet from 01_alignment/aepCl.R)

crossSankey

As an alternative approach to visualize and evaluate the degree of transcriptional similarity between Hydra and Clytia cells, we also calculated a distance metric that captured how different the overall transcriptional profiles of cells from one species were when compared to their most similar transcriptional neighbors from the other species. To generate this metric, we calculated the average distance to the 30 nearest cross-species nearest neighbors in aligned principal component space for each cell. We then plotted these values on the cross-species UMAP to visually link these distance values to cell type annotations.

crossSpecDistance

crossSpecDistanceSplit

We also summarized the distance metric results using a boxplot that grouped distance scores by cell type in each species. The first boxplot presents distances for Hydra cell types.

hvDistBox

The second boxplot presents distances for Clytia cell types.

clDistBox

Identifying Orthologous Genes with Conserved Expression Patterns

The alignment score we calculated above could be used to holistically examine transcriptional similarities, but it did not give us access to the genes whose expression patterns were conserved in Hydra and Clytia. To identify genes with conserved expression patterns, we needed a way to correlate the expression patterns in Hydra and Clytia cells.

To do this, we used a very high resolution Louvain clustering analysis to generate ad-hoc 'pseudo-cells'. These pseudo-cells grouped together cells (regardless of species) that were close together in the aligned principal component space. We calculated average gene expression values in each species for each pseudo-cell and then, by matching pseudo-cell labels from each species, identified genes with similar expression patterns.

We started by first importing the Seurat object containing the aligned Clytia and Hydra data and performing a high resolution Louvain clustering analysis. This generated a total of 132 'pseudo-cell' clusters.

(03_expressionConservation/aepCl_Cor.R)

pseudoCellUMAP

We then extracted the normalized read matrix from the Seurat object, split cells both by species and pseudo-cell ID, and calculated average read counts. We then calculated correlation scores for pseudo-cell expression when comparing the two species. We used the resulting scores as a readout of how conserved the cell-type-specificity was for a pair of Clytia and Hydra orthologs.

(03_expressionConservation/aepCl_Cor.R)

For identifying conserved expression patterns, we used high resolution clusters, which allowed us to perform a relatively more refined comparison of expression in the two species. To visualize these conserved expression patterns, we opted to use a gene-by-cell-type heatmap. In order to avoid having a high number of columns (cell types) in the heatmap, thus improving overall readability, we used a lower resolution clustering to summarize the data. We generated descriptive labels for these lower resolution clusters based on the identities of both the Hydra and Clytia cells making up each cluster, although we showed some preference for labels based on Hydra annotations as Hydra cell types are generally better characterized.

(03_expressionConservation/aepCl_Cor.R)

corPlotClusts

Using these broader clusters we again averaged gene expression by clusters (split by species) to generate the values that would populate the heatmap plot. We restricted these heatmaps to only those ortholog pairs with high correlation values (correlation score > 0.65)

We first plotted the Hydra data:

(03_expressionConservation/aepCl_Cor.R)

aepAllHeat

We then plotted the Clytia data

(03_expressionConservation/aepCl_Cor.R)

clAllHeat

We also generated a table including correlation scores for all high scoring (correlation score > 0.65) ortholog pairs as well as functional annotations (putative vertebrate orthologs, protein domains, etc.)

(03_expressionConservation/aepCl_Cor.R)

Finally, we generated another set of heat maps that included only putative transcription factors (predictions were based on interproscan results as described in 03_aepGenomeAnnotation.md). First for AEP:

(03_expressionConservation/aepCl_Cor.R)

aepTfHeat

Then for Clytia:

(03_expressionConservation/aepCl_Cor.R)

clTfHeat

 

Files Associated with This Document