Haplotype and population structure inference using neural networks in whole-genome sequencing data

  1. Anders Albrechtsen
  1. University of Copenhagen
  • * Corresponding author; email: jonas.meisner{at}bio.ku.dk
  • Abstract

    Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By utilizing Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We demonstrate that we can use haplotype clusters in the latent space to infer global population structure utilizing haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we demonstrate that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.

    • Received April 3, 2022.
    • Accepted June 28, 2022.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Related Article

    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.276813.122 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Related Content

    Share

    Preprint Server