Cross-species cell-type assignment from single-cell RNA-seq data by a heterogeneous graph neural network

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Overview of CAME. (A) The architecture of the heterogeneous graph neural network in CAME. The scRNA-seq data of both reference and query species and their homology genes are encoded as a heterogeneous cell–gene graph. The cell–gene edge indicates that the cell has non-zero expression of the gene. The gene homologous mappings are represented by a gene–gene bipartite graph, with each edge indicating a gene homology. Note that the homologous gene mappings can be many-to-many homologies. To preserve the intrinsic data structure, the within-species cell–cell edges are adopted where an edge between a pair of cells indicates that one is the k nearest neighbor of the other (k = 5 by default). The heterogeneous graph and the gene expression profiles are input to CAME, passing through the inductive embedding layer, the recurrent relational graph neural network, and the graph classifier with attention mechanisms. The model is trained by minimizing the cross-entropy loss calculated between the model prediction and the given labels of the reference cells in an end-to-end manner. (B) Graph spatial convolutions for six different types of edges, including “a cell expresses a gene,” “a gene is expressed by a cell,” “cell–cell similarity,” “gene–gene homology,” “cell self-loop,” and “gene self-loop” with the edge type–specific convolution weights. (C) Heterogeneous graph attention classifier on the last layer, where each cell pays different attention to its neighbor genes. The output cell-type probabilities are calculated by the weighted sum of the neighbor-gene embeddings, followed by the softmax normalization. The attention weights are calculated from the concatenated cell and gene embeddings with a linear transformation, followed by activation and the softmax normalization among the neighbor genes of the cell. (D) The output of CAME includes the probabilistic cell-type assignment of the query species, as well as low-dimensional embeddings of the cells and genes from both species. The gene embeddings are used for joint module extraction that allows inter-species comparison of conservative or divergent characteristics.

This Article

  1. Genome Res. 33: 96-111

Preprint Server