Fast and accurate out-of-core PCA framework for large scale biobank data

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Performance on the East Asian data from 1000 Genomes Project. PCA performance of different software with various numbers of inferred PCs (K) based on 2,000,000 SNPs and 400 individuals from four East Asian populations. (A) Number of epochs used by each software. * Not out-of-core, only allows for in-core computation with this number of iterations. (B) Two estimated PCs K = 2 for each methods including the true full SVD. (C) Memory usage as a function of K (log scale at x-axis). (D) Accuracy (MEV) compared to the true full-rank SVD as a function of K (log scale at x-axis). (E) Convergence of PCAone and PCAoneH + Y shown as accuracy per epoch.

This Article

  1. Genome Res. 33: 1599-1608

Preprint Server