Proxy panels enable privacy-aware outsourcing of genotype imputation

  1. Arif Harmanci1,2
  1. 1Department of Bioinformatics and Systems Medicine, University of Texas Health Science Center, Houston, Texas 77030, USA;
  2. 2Department of Health Data Science and Artificial Intelligence, D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA
  • Corresponding author: arif.o.harmanci{at}uth.tmc.edu
  • Abstract

    One of the major challenges in genomic data sharing is protecting participants’ privacy in collaborative studies and in cases when genomic data are outsourced to perform analysis tasks, for example, genotype imputation services and federated collaborations genomic analysis. Although numerous cryptographic methods have been developed, these methods may not yet be practical for population-scale tasks in terms of computational requirements, rely on high-level expertise in security, and require each algorithm to be implemented from scratch. In this study, we focus on outsourcing of genotype imputation, a fundamental task that utilizes population-level reference panels, and develop protocols that rely on using “proxy panels” to protect genotype panels, whereas the imputation task is being outsourced at servers. The proxy panels are generated through a series of protection mechanisms such as haplotype sampling, allele hashing, and coordinate anonymization to protect the underlying sensitive panel's genetic variant coordinates, genetic maps, and chromosome-wide haplotypes. Although the resulting proxy panels are almost distinct from the sensitive panels, they are valid panels that can be used as input to imputation methods such as Beagle. We demonstrate that proxy-based imputation protects against well-known attacks with a minor decrease in imputation accuracy for variants in a wide range of allele frequencies.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278934.124.

    • Freely available online through the Genome Research Open Access option.

    • Received January 2, 2024.
    • Accepted January 6, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    This article has not yet been cited by other articles.

    OPEN ACCESS ARTICLE

    Preprint Server