Abstract
Precise transcription factor (TF) binding to DNA governs gene regulation, yet nucleotide sequence alone often fails to fully capture binding specificity. While static DNA shape is a recognized determinant of indirect readout, the role of intrinsic conformational flexibility remains underexplored across TF families. Here, we demonstrate that integrating sequence-derived DNA flexibility descriptors into predictive models improves both prediction and mechanistic interpretability of TF-DNA affinity. Across large-scale in vitro datasets encompassing HT-SELEX and protein-binding microarrays for mammalian and Drosophila TFs, flexibility-augmented models consistently outperform sequence-only baselines and complement DNA shape models. Cross-platform analyses further indicate that flexibility features capture transferable structural information that is robust to platform-specific biases. Using a position-resolved interpretation framework, we uncover family-specific "flexibility footprints", including recurrent hotspots in core motifs and flanks that align with DNA structural deformations from TF-DNA co-complex structures. Extending to ENCODE ChIP-seq and DNase-seq data, flexibility augmentation improves the classification of functional TF binding sites across diverse TFs and cellular contexts. Collectively, these results highlight the insufficiency of sequence-only models and highlight the utility of the flexibility descriptors as an interpretable component of the TF recognition code.