Predicted protein 3D structure provides essential insights into the genetic architecture underlying phenotypic diversity in maize
- Shuai Wang1,
- Merritt Khaipho-Burch2,
- Lynn C. Johnson2,
- Zachary R. Miller2,
- Peter Bradbury3,
- Doug Speed4,
- William J. Allen5,
- M. Cinta Romay2,
- Jiquan Xue6,
- Edward S. Buckler7,
- Guillaume P. Ramstein4 and
- Baoxing Song1,8
- 1 Northwest A&F University, Peking University Institute of Advanced Agricultural Sciences;
- 2 Cornell University;
- 3 United States Department of Agriculture;
- 4 Aarhus University;
- 5 Texas Advanced Computing Center, University of Texas at Austin;
- 6 Northwest A&F University;
- 7 Cornell University, United States Department of Agriculture
Abstract
Variation in protein 3D structures reflects genetic variation and contributes to phenotypic diversity, yet its underlying genetic mechanisms remain unclear. To investigate the relationship between protein 3D structure and phenotype, we predicted the 3D structures of 795,649 proteins from 26 maize (Zea mays L.) inbred lines using AlphaFold2. Population genetics analysis of these protein 3D structures revealed that buried residues held greater genomic evolutionary rate profiling (GERP) scores than exposed residues, indicating that buried residues are under stronger purifying selection. The design of the maize nested association mapping population makes it possible to utilize haplotype information and protein 3D structural variation to reveal the molecular mechanisms linking genetic diversity and phenotypic variation for a population with ~5,000 individuals. Associating protein 3D structure variation with phenotypes (structure-based proteome-wide association study, PWAS) identified 15.7% more (96 vs. 83) significant proteins compared to associating protein sequence with phenotypes (sequence-based PWAS) using 32 agronomic traits. Moreover, structure-based PWAS identified 24 additional significant proteins unique to predicted structures, while sequence-based PWAS identified 11 additional significant proteins. Structure-based proteome-wide predictions (PWP) improved genomic prediction accuracy by an average of 3.8% compared to sequence-based PWP. In general, predicted protein 3D structures represent a powerful approach for understanding the natural diversity of protein haplotypes.
- Received February 13, 2025.
- Accepted October 22, 2025.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











