Autoencoders for genomic variation analysis

Table 3.

Comparison of clustering performance of PCA versus VAE

Data type Human data Canine data
Pseudo F statistic PCA 50,315.56 151.89
VAE 56,409.29 276.03
Silhouette coefficient PCA 0.69 0.12
VAE 0.77 0.07
Davies–Bouldin index PCA 0.48 3.87
VAE 0.29 3.39
  • PCA and VAE parameters have been fitted to human and canine SNP data sets of 839,629 and 198,473 SNP positions, respectively. Clustering metrics have been computed on seven self-reported human ancestry groups and 16 canine clades composed of 144 distinct canine breeds. The 2D latent coordinates of the samples have been standardized. Bold values indicate the better-performing method for each metric and data type (higher is better for Pseudo F and Silhouette; lower is better for Davies–Bouldin).

This Article

  1. Genome Res. 36: 348-360

Preprint Server