Table 3.

Comparison of clustering performance of PCA versus VAE

Data typeHuman dataCanine data
Pseudo F statisticPCA50,315.56151.89
VAE56,409.29276.03
Silhouette coefficientPCA0.690.12
VAE0.770.07
Davies–Bouldin indexPCA0.483.87
VAE0.293.39

[i] PCA and VAE parameters have been fitted to human and canine SNP data sets of 839,629 and 198,473 SNP positions, respectively. Clustering metrics have been computed on seven self-reported human ancestry groups and 16 canine clades composed of 144 distinct canine breeds. The 2D latent coordinates of the samples have been standardized. Bold values indicate the better-performing method for each metric and data type (higher is better for Pseudo F and Silhouette; lower is better for Davies–Bouldin).