SepSolve overview and proof of principle. (A) Overview of the SepSolve algorithm. Using linearizations, SepSolve models the search for marker genes as an integer linear program. Each grid point within the feasible region defined by hyperplanes corresponds to a candidate set of marker genes (two in this example). The optimal solution (here, the highest grid point) minimizes the overlap (dark area) between balls enclosing most cells of a given type, or more formally, minimizes total violation of c-separation between cell types. (B) UMAP embedding of cells in a human lung data set (Madissoon et al. 2020), using 10,000 hvg (top) or 50 markers selected by SepSolve (bottom). (C) Cells from four different types in the projected space of marker genes computed by SepSolve (top) and scGeneFit and G-PC (bottom). The plots show simulated expression levels of two genes (specified on the axes).
