
Proposed VQ-VAE architecture for genotype compression. The window-based VQ-VAE autoencoder processes an input SNP sequence
x and encodes with
into H bottleneck representations (H is the number of heads in the encoder). The quantizer Q substitutes the bottleneck representations by the closest codebook
embeddings. Finally, the latent representation can be encoded as an integer index matrix. For the decoding step, codebook
embeddings are fetched according to the indices of the index matrix and decoded as usual with the window-based autoencoder.
The output is thresholded to obtain the reconstruction. The difference of the input with the reconstruction yields the residual
r which, together with the index matrix, can be integrated in any bitstream-coding-based compression pipeline, such as Genozip
(Lan et al. 2021), Zstandard (Collet and Kucherawy 2018), or Blosc (https://www.blosc.org).











