Abstract
Human-specific segmental duplications (HSDs) contain millions of base pairs of sequence unique to the human genome, including genes that shape neurodevelopment. Despite their young age (<6 million years), HSD genes exhibit widespread regulatory divergence, with paralog-specific expression patterns documented across a variety of tissues and cell types. Using long-read expression and epigenomic data, we show that human-specific paralogs tend to have lower activity than the shared, ancestral ones. To systematically characterize the cis-regulatory elements (CREs) within HSDs and understand patterns of regulatory change in recently evolved gene families, we conduct a massively parallel reporter assay of 7760 human duplicated and chimpanzee orthologous sequences in lymphoblastoid (GM12878) and neuroblastoma (SH-SY5Y) cell lines. A large proportion (14%–24%) of sequences exhibit differential activity relative to the chimpanzee ortholog (or between human paralogs), mostly with small fold-differences. Combining measured activity levels across all assayed sequences, predicted differences in cis-regulatory activity correlate with mRNA levels in SH-SY5Y. Differentially active CREs validated for CHRFAM7A, HYDIN2, and SRGAP2C may contribute to paralog-specific expression patterns and thereby to human-specific traits. Although we identify some changes in CRE activity within duplicated regions, consideration of adjacent, unique sequences suggests a larger contribution from genome positional effects. In all, this work shows that functional divergence of duplicated CREs contributes moderately to regulatory divergence of HSD genes and uncovers enhancers that are candidate drivers of human-specific regulatory patterns.