Analysis of membrane proteins in metagenomics: Networks of correlated environmental features and protein families
- Prianka V. Patel1,6,
- Tara A. Gianoulis2,6,
- Robert D. Bjornson3,4,
- Kevin Y. Yip1,
- Donald M. Engelman1 and
- Mark B. Gerstein1,3,5,7
- 1 Department of Molecular Biophysics and Department of Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
- 2 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
- 3 Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA;
- 4 Keck Biotechnology Resource Laboratory, Yale University, New Haven, Connecticut 06520, USA;
- 5 Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
-
↵6 These authors contributed equally to this work.
Abstract
Recent metagenomics studies have begun to sample the genomic diversity among disparate habitats and relate this variation to features of the environment. Membrane proteins are an intuitive, but thus far overlooked, choice in this type of analysis as they directly interact with the environment, receiving signals from the outside and transporting nutrients. Using global ocean sampling (GOS) data, we found nearly ∼900,000 membrane proteins in large-scale metagenomic sequence, approximately a fifth of which are completely novel, suggesting a large space of hitherto unexplored protein diversity. Using GPS coordinates for the GOS sites, we extracted additional environmental features via interpolation from the World Ocean Database, the National Center for Ecological Analysis and Synthesis, and empirical models of dust occurrence. This allowed us to study membrane protein variation in terms of natural features, such as phosphate and nitrate concentrations, and also in terms of human impacts, such as pollution and climate change. We show that there is widespread variation in membrane protein content across marine sites, which is correlated with changes in both oceanographic variables and human factors. Furthermore, using these data, we developed an approach, protein families and environment features network (PEN), to quantify and visualize the correlations. PEN identifies small groups of covarying environmental features and membrane protein families, which we call “bimodules.” Using this approach, we find that the affinity of phosphate transporters is related to the concentration of phosphate and that the occurrence of iron transporters is connected to the amount of shipping, pollution, and iron-containing dust.
Footnotes
-
↵7 Corresponding author.
E-mail mark.gerstein{at}yale.edu.
-
[Supplemental material is available online at http://www.genome.org.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.102814.109.
- Received November 5, 2009.
- Accepted April 22, 2010.
- Copyright © 2010 by Cold Spring Harbor Laboratory Press











