topN versus greedy methods. (A) Presence/absence matrix for three samples and four variants. (B) When picking the two most diverse samples, the topN algorithm selects A and B because they are the individuals with the greatest number of variants. However, this selection only includes three of the four variants. (C) The greedy algorithm on the other hand selects A and C because it accounts for the fact that the variants covered by B have already been included by A. The greedy selection includes all four variants.
