Improvements in cell-type classification and visualization. (A) One scores of a logistic regression classifier when provided varying numbers of marker genes computed by the different methods on the four data sets. On HCA, cell-type labels distinguished the tissue of origin. scGeneFit ran successfully in pairwise mode only on the two smallest data sets, SMaSH only on Zheng8eq (for details, see text). Shaded regions depict standard deviation. (B) UMAP embeddings generated from the 20 marker genes selected by SepSolve and DE on the human lung data set MeL. UMAP embeddings of the original space and of marker genes selected by the remaining methods are in Supplemental Figure 5, along with the complete cell-type legend.
