Automated interpretable artificial intelligence genomic prediction with AIGP
Abstract
Predicting phenotypes from genomic mutations remains a major genetic challenge. Traditional statistical methods (such as GBLUP and BayesR) have limitations, including reliance on artificial prior assumptions, and hard to capture epistatic effects. Machine learning (ML) has emerged as a powerful alternative for genomic prediction; however, it often struggles with interpretability because of its black-box nature. We evaluate 12 ML models alongside GBLUP and BayesR to identify key factors influencing genomic prediction performance across traits with different genetic architectures in multiple agricultural species, including pigs, chickens, horses, maize, and we use a series of simulated datasets to assess the impacts of various parameters. Trait genetic architecture and feature selection are the primary determinants of predictive performance. Boosting algorithms outperform the other ML methods and can be further improved by refining biological feature engineering and optimizing the hyperparameters. We demonstrate how gene-related biometrics influence target traits and how accounting for interaction effects enhances prediction accuracy. In addition, we apply Shapley Additive Explanations (SHAP) to quantify the SNP additive and epistatic effects. To bridge the gap between algorithmic advancements and biological interpretability, we develop artificial intelligence genomic prediction (AIGP), an open-source end-to-end toolkit for genomic prediction research. Our findings highlight the potential of ML for genomic prediction and emphasize the importance of explainable ML approaches, integration of prior information, and parameter optimization. The AIGP toolkit enables automated model optimization and interpretability, making ML-driven genomic selection more accessible and providing new tools to support genomic research.
- Received June 11, 2025.
- Accepted March 2, 2026.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











