
Modeling gene expression from combinatorial binding. (A) Peaks from multiple experiments (TFs, HMs) are merged into nonoverlapping CREs, within the same cell type. (B) For each CRE, the normalized reads intensity in each experiment is estimated. The RC matrix is then produced, representing the CREs in rows, and the reads' intensity profiles of different experiments in columns. NMF analysis is applied to the RC matrix, to group the M experiments of TFs and HMs into k complexes. NMF decomposes the RC matrix into the basis matrix and the mixture coefficient matrix. The basis matrix contains the coefficient of each CRE in a complex (also called complex score), while each complex represents a positive linear combination of the original read count variables for each experiment (coefficients matrix). (C) The CREs that occur within a fixed-range window around a TSS are estimated. Then, the complex scores and the proximity of the CREs to the TSS of a gene are integrated into a Binding Influence Score (BIS) between a protein complex and a gene. d0 is a constant used to specify the shape of the exponential function (see Methods). (D) The BIS values are used as predictors to assess the contribution of protein complexes to gene expression in regression models.











