Comparative transcriptomics of primary cells in vertebrates

Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.


Supplemental Figure S2
Conservation analysis of differentially expressed genes compared to an expressionmatched background.
Supplemental Figure S3 Integrative Correlation Analysis.
Supplemental Figure S4 Gene expression correlation of human and mouse samples.
Supplemental Figure S5 Gene expression correlation of human, mouse, rat, dog, and chicken samples.
Supplemental Figure S6 Phylogenetic analysis of expression divergence.
Supplemental Figure S7 Promoter analysis of gene expression correlations.
Supplemental Figure S8 Promoter analysis of gene expression correlations compared to an expressionmatched background.
Supplemental Figure S9 Conservation analysis of gene expression correlations.
Supplemental Figure S10 Conservation analysis of gene expression correlations compared to an expressionmatched background.
Supplemental Figure S11 Gene Ontology analysis of gene expression correlations.
Supplemental Figure S12 Analysis of RNA-Seq expression data for endometrial stromal fibroblast primary cells (Kin et al., 2016).
Supplemental Figure S13 Comparative analysis of RNA-Seq expression data for 12 matching tissues in human and mouse (The ENCODE Project Consortium, 2012).
Supplemental Figure S14 Evaluation of predicted transcription factor binding sites.
Supplemental Figure S15 Pearson's correlation across cell types between orthologous genes, as a function of TFBS conservation in their promoter regions.
Supplemental Figure S16 Cumulative distribution of the Pearson's correlation r across cell types in motif activity between promoters and enhancers in human, mouse, rat, dog, and chicken.
Supplemental Figure S17 Comparison of CAGE expression levels of the pri-miRNA to the mature miRNA expression levels measured by sRNA sequencing in aortic smooth muscle cells in human, mouse, rat, dog, and chicken.
Supplemental Figure S18 Comparison of differential expression analysis of pri-miRNAs and mature miRNAs.
Supplemental Figure S1. Promoter analysis of differentially expressed genes compared to an expression-matched background. Genes were divided into two categories depending on whether the genomic region of the dominant promoter of the gene had an orthologous genomic region in the human genome. The enrichment was calculated by comparing the number of differentially expressed genes in each of the two categories to the number of differentially expressed genes in an expression-matched background, with the log2(odds ratio) shown on the horizontal axis of the left panel and p-value calculated using Fisher's exact test for each comparison in the right panel. N is the total number of expressed genes in each cell type. Figure S2. Conservation analysis of differentially expressed genes compared to an expression-matched background. Genes were divided into three categories based on the age of the most recent common ancestor. The enrichment was calculated by comparing the number of differentially expressed genes to the number of differentially expressed genes in an expression-matched background, with the log2(odds ratio) shown on the horizontal axis of the left panel and p-value calculated using Fisher's exact test for each comparison in the right panel. N is the total number of expressed genes in each cell type with an annotation in the NCBI HomoloGene database. Examples the distribution of the integrative correlation coefficient between human and mouse for genes belonging to specific Gene Ontology terms identified as significant (FDR < 0.05) by Analysis of Functional Annotation. The distribution for Gene Ontology terms associated the key cellular processes and components (chromatin, nuclear part, cell cycle, and gene expression) are shifted to right suggesting a higher degree of conservation, while processes related to cell-to cell interaction (cell-cell and synaptic signaling, cell projection and plasma membrane region) are skewed to the left, suggesting a lesser degree of conservation across species.  Figure S7. Promoter analysis of gene expression correlations. Pearson's correlation in each cell type and species was calculated separately for genes for which the dominant promoter had an orthologous genomic region in the human genome, and for genes for which the dominant promoter did not have an orthologous genomic region in the human genome. The one-sided p-value shown on the right was calculated by applying the Fisher Z-transformation to each correlation value and performing a paired Z-test. N is the number of expressed genes in each cell type.  Figure S8. Promoter analysis of gene expression correlations compared to an expression-matched background. Difference in squared Pearson's correlation r 2 in each cell type and species, relative to an expression-matched background, was calculated for genes for which the dominant promoter had an orthologous genomic region in the human genome, and for genes for which the dominant promoter did not have an orthologous genomic region in the human genome. The one-sided p-value shown on the right was calculated by applying the Fisher Z-transformation to each correlation value and performing a paired Z-test relative to the expression-matched background. N is the number of expressed genes in each cell type.
p -value calculated using the Fisher transformation Z Supplemental Figure S9. Conservation analysis of gene expression correlations. Pearson's correlation was calculated in each cell type and species across orthologous genes for three categories of genes defined by the age of their most recent common ancestor. The Fisher Z-transformation was applied to each correlation value to obtain Z-scores. The one-sided p-value of a linear regression model of the Z-scores against the evolutionary age category is shown on the right, together with the number N of expressed genes in each cell type with an annotation in the NCBI HomoloGene database.  Figure S10. Conservation analysis of gene expression correlations compared to an expression-matched background. Difference in squared Pearson's correlation r 2 in each cell type and species, relative to an expression-matched background, was calculated for three categories of genes defined by the age of their most recent common ancestor. The one-sided p-value shown on the right was calculated by applying the Fisher Z-transformation to each correlation value and performing a paired Z-test relative to the expression-matched background. N is the number of expressed genes in each cell type.  Figure S11. Gene Ontology analysis of gene expression correlations. The p-value of each Gene Ontology category in each cell type and species was calculated by applying the Fisher Z-transformation to the correlation value and performing a paired Z-test relative to the expression-matched background, with a higher or lower correlation value shown in blue and red, respectively. p -value of higher (sign = + 1) correlation across genes protein binding nucleoplasm RNA binding mRNA splicing, via spliceosome cytosol nucleus membrane mitochondrion nuclear speck mitochondrial inner membrane catalytic step 2 spliceosome nuclear matrix retrograde vesicle-mediated transport, Golgi to ER mRNA 3'-end processing cadherin binding nuclear chromatin mRNA processing chromatin binding RNA splicing Golgi membrane nuclear body DNA repair transcription, DNA-templated IRE1-mediated unfolded protein response protein transport centrosome COPII vesicle coat protein folding in endoplasmic reticulum extracellular exosome macroautophagy ER to Golgi vesicle-mediated transport ubiquitin-dependent protein catabolic process cytoplasm mRNA export from nucleus endoplasmic reticulum-Golgi intermediate compartment intracellular protein transport spliceosomal complex endoplasmic reticulum retrograde transport, endosome to Golgi enzyme binding ATP-dependent RNA helicase activity Golgi organization mRNA polyadenylation regulation of transcription, DNA-templated antigen processing and presentation of peptide antigen via MHC class I endoplasmic reticulum-Golgi intermediate compartment membrane autophagy of mitochondrion RNA polymerase II core binding nuclear pore tRNA aminoacylation for protein translation transport vesicle protein-containing complex histone deacetylase activity ubiquitin-dependent ERAD pathway RNA polymerase II proximal promoter sequence-specific DNA binding termination of RNA polymerase II transcription cell cycle RNA export from nucleus ATP binding mRNA binding protein folding negative regulation of transcription by RNA polymerase II transcription-coupled nucleotide-excision repair RNA secondary structure unwinding pre-mRNA cleavage required for polyadenylation actin filament organization inflammatory response response to mechanical stimulus anchored component of membrane growth factor activity defense response to Gram-positive bacterium cytokine activity cell-cell signaling Analysis of RNA-Seq expression data for endometrial stromal fibroblast primary cells (Kin et al., 2016). (A) Percentage of differentially expressed genes in each species, separated by the age of the most recent common ancestor for each gene. The one-sided p-value calculated using Fisher's exact test is shown on the right, together with the number N of expressed genes in each species with an annotation in the NCBI HomoloGene database. (B) Enrichment of differentially expressed genes in each species, separated by the age of the most recent common ancestor. The enrichment was calculated by comparing the number of differentially expressed genes to the number of differentially expressed genes in an expression-matched background, with the log 2 (odds ratio) shown on the horizontal axis of the left panel and p-value calculated using Fisher's exact test for each comparison in the right panel. N is the total number of expressed genes in each species with an annotation in the NCBI HomoloGene database. (C) Gene Ontology analysis of differentially expressed genes. The p-value, calculated using Fisher's exact test, of overrepresentation or underrepresentation of differentially expressed genes in each Gene Ontology category compared to an expression-matched set of background genes is shown in red and blue, respectively. Comparative analysis of RNA-Seq expression data for 12 matching tissues in human and mouse (The ENCODE Project Consortium, 2012). (A) Percentage of differentially expressed genes in each tissue, separated by the age of the most recent common ancestor for each gene. The one-sided p-value calculated using Fisher's exact test is shown on the right, together with the number N of expressed genes in each tissue with an annotation in the NCBI HomoloGene database. (B) Enrichment of differentially expressed genes in each tissue, separated by the age of the most recent common ancestor. The enrichment was calculated by comparing the number of differentially expressed genes to the number of differentially expressed genes in an expression-matched background, with the log2(odds ratio) shown on the horizontal axis of the left panel and p-value calculated using Fisher's exact test for each comparison in the right panel. N is the total number of expressed genes in each tissue with an annotation in the NCBI HomoloGene database. (C) Gene Ontology analysis of differentially expressed genes. The p-value, calculated using Fisher's exact test, of overrepresentation or underrepresentation of differentially expressed genes in each Gene Ontology category compared to an expression-matched set of background genes is shown in red and blue, respectively.
Evaluation of predicted transcription factor binding sites. The maximum TFBS score was calculated for each genomic region found in ChIP-seq experiments to be bound by transcription factors associated with the motif, and compared to an equal number of randomly selected genomic regions. The number N of ChIP-seq regions for each motif is shown. Motifs for which the maximum motif scores are significantly (p < 0.05, Mann-Whitney U test) higher for ChIP-seq regions compared to background regions are indicated by an asterisk.  Figure S17. Comparison of CAGE expression levels of the pri-miRNA to the mature miRNA expression levels measured by sRNA sequencing in aortic smooth muscle cells in human, mouse, rat, dog, and chicken. To allow a direct comparison of the correlation values observed in human and each of the species, each row compares the CAGE and sRNA expression levels for the same set of miRNAs in human and mouse, rat, dog, or chicken. Comparison of differential expression analysis of pri-miRNAs and mature miRNAs. (left) Comparison of log 2 expression ratios obtained in differential expression analysis performed on the expression levels of pri-miRNAs (using CAGE sequencing data) and mature miRNA (using sRNA sequencing data). (right) Number of differentially expressed miRNAs found in differential expression analysis of pri-miRNA (CAGE) and mature miRNA (sRNA) expression data. The concordant classification rate is calculated as the number of miRNAs that are significantly up-or down-regulated compared to human both in CAGE differential expression analysis and in sRNA differential expression analysis, divided by the total number of miRNAs differentially expressed in both CAGE and sRNA differential expression analysis. .