Haplotype and population structure inference using neural networks in whole-genome sequencing data

Jonas Meisner; Anders Albrechtsen

doi:10.1101/gr.276813.122

Haplotype and population structure inference using neural networks in whole-genome sequencing data

Jonas Meisner and
Anders Albrechtsen

Department of Biology, Bioinformatics Center, University of Copenhagen, DK-2200 Copenhagen, Denmark

Corresponding author: jonas.meisner{at}bio.ku.dk

Abstract

Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276813.122.

Received April 3, 2022.
Accepted June 28, 2022.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Articles citing this article

Autoencoders for genomic variation analysis Genome Res February 1, 2026 36: 348-360

Faroese Whole Genomes Provide Insight into Ancestry and Recent Selection bioRxiv January 7, 2026 0: 2025.05.20.655212v2-2025.05.20.655212

Accurate and scalable genome-wide ancestry estimation using haplotype clusters bioRxiv September 8, 2025 0: 2025.09.02.673718v1-2025.09.02.673718

The origin, invasion history and resistance architecture of Anopheles stephensi in Africa bioRxiv March 29, 2025 0: 2025.03.24.644828v1-2025.03.24.644828

Fine-mapping SLE-MHC associations revealed independent contributions of HLA missense variants and C4 copy number variations medRxiv November 26, 2024 0: 2024.11.21.24317596v1-2024.11.21.24317596

Latent generative modeling of long genetic sequences with GANs bioRxiv October 24, 2024 0: 2024.08.07.607012v3-2024.08.07.607012

Measuring linkage disequilibrium and improvement of pruning and clumping in structured populations bioRxiv May 7, 2024 0: 2024.05.02.592187v1-2024.05.02.592187

Leveraging haplotype information in heritability estimation and polygenic prediction medRxiv May 6, 2024 0: 2024.04.30.24306654v1-2024.04.30.24306654

Haplotype and population structure inference using neural networks in whole-genome sequencing data

Abstract

Footnotes

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Related Content

Share

Preprint Server

Current Issue

In This Issue

Haplotype and population structure inference using neural networks in whole-genome sequencing data

Abstract

Footnotes

Articles citing this article

Related Article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Related Content

Share

Preprint Server

Current Issue

In This Issue