TY - JOUR A1 - Ravichandran, Prashanthi A1 - Parsana, Princy A1 - Keener, Rebecca A1 - Hansen, Kasper D. A1 - Battle, Alexis T1 - Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene coexpression networks Y1 - 2025/09/01 JF - Genome Research JO - Genome Research SP - 2087 EP - 2103 DO - 10.1101/gr.280808.125 VL - 35 IS - 9 UR - http://genome.cshlp.org/content/35/9/2087.abstract N2 - Gene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. recount3, a data set with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we develop a pipeline to annotate samples based on cell-type composition. By comparing aggregation strategies, we find that regressing confounders within studies and prioritizing larger studies optimizes network reconstruction. We apply these findings to infer three consensus networks (universal, cancer, noncancer) and 27 context-specific networks. Central genes in consensus networks are enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas context-specific central nodes include tissue-specific transcription factors. The increased statistical power from data aggregation facilitates the derivation of variant annotations from context-specific networks, which are significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. Although data aggregation led to strictly increasing held-out log-likelihood, we observe diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, can further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation. ER -