DNA conformational flexibility descriptors improve transcription factor binding prediction across diverse transcription factor families

Abstract

Precise transcription factor (TF) binding to DNA governs gene regulation, yet nucleotide sequence alone often fails to fully capture binding specificity. While static DNA shape is a recognized determinant of indirect readout, the role of intrinsic conformational flexibility remains underexplored across TF families. Here, we demonstrate that integrating sequence-derived DNA flexibility descriptors into predictive models improves both prediction and mechanistic interpretability of TF-DNA aﬀinity. Across large-scale in vitro datasets encompassing HT-SELEX and protein-binding microarrays for mammalian and Drosophila TFs, flexibility-augmented models consistently outperform sequence-only baselines and complement DNA shape models. Cross-platform analyses further indicate that flexibility features capture transferable structural information that is robust to platform-specific biases. Using a position-resolved interpretation framework, we uncover family-specific "flexibility footprints", including recurrent hotspots in core motifs and flanks that align with DNA structural deformations from TF-DNA co-complex structures. Extending to ENCODE ChIP-seq and DNase-seq data, flexibility augmentation improves the classification of functional TF binding sites across diverse TFs and cellular contexts. Collectively, these results highlight the insuﬀiciency of sequence-only models and highlight the utility of the flexibility descriptors as an interpretable component of the TF recognition code.