|
|
|
|
Genome Res. 16:636-643, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00 OPEN ACCESS ARTICLE
Letter The fate of laterally transferred genes: Life in the fast lane to adaptation or deathDepartment of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1
Large-scale genome arrangement plays an important role in bacterial genome evolution. A substantial number of genes can be inserted into, deleted from, or rearranged within genomes during evolution. Detecting or inferring gene insertions/deletions is of interest because such information provides insights into bacterial genome evolution and speciation. However, efficient inference of genome events is difficult because genome comparisons alone do not generally supply enough information to distinguish insertions, deletions, and other rearrangements. In this study, homologous genes from the complete genomes of 13 closely related bacteria were examined. The presence or absence of genes from each genome was cataloged, and a maximum likelihood method was used to infer insertion/deletion rates according to the phylogenetic history of the taxa. It was found that whole gene insertions/deletions in genomes occur at rates comparable to or greater than the rate of nucleotide substitution and that higher insertion/deletion rates are often inferred to be present at the tips of the phylogeny with lower rates on more ancient interior branches. Recently transferred genes are under faster and relaxed evolution compared with more ancient genes. Together, this implies that many of the lineage-specific insertions are lost quickly during evolution and that perhaps a few of the genes inserted by lateral transfer are niche specific.
Gene insertions and deletions, together with gene inversions and translocations, play important roles in shaping bacterial genomes (Itaya 1997
Gene insertions and deletions can be inferred by examining the presence or absence of a gene (or a gene family) on a phylogenetic tree. In some recent studies, the parsimony method has been used to infer insertions/deletions (Daubin et al. 2003a
Likelihood analysis has been successfully used to reconstruct phylogenies using sequence data since its first application by Neyman (Neyman 1971 For the likelihood analysis, the insertion rate was assumed to be equal to the deletion rate on each branch, but insertion/deletion rates could vary among different branches or in different parts of the phylogeny. These results suggest that recently transferred genes are more common. If this is to be an evolutionarily stable situation, it suggests that many laterally transferred genes are more likely to have a high propensity of being deleted quickly after transfer. The rates of insertion/deletion from the maximum likelihood analysis were compared to observed nucleotide substitution rates and found to be of a comparable or larger rate; the rates inferred increase at the tips of the phylogeny.
The maximum likelihood analysis used the phylogeny of concatenated DNA sequences from the genes gmk, glpF, and pycA (Fig. 1) and inferred the relative insertion/deletion rates by assuming that individual insertion and deletion events occur independently (this model also assumes that genes can be regained multiple times after having been deleted). Initially, a single constant insertion/deletion rate was assumed on the phylogeny (Case 1 in Fig. 2) using the observed gene presence/absence patterns (Table 1). The likelihood function for this model is a simple continuous function for the chosen parameter ranges, and there is a smooth continuous change of the likelihood as the insertion/deletion rates change (Fig. 3). The insertion/deletion rate that gives the maximum likelihood value in this case is 0.51.
The strains from Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are closely related and have been suggested to form the B. cereus group (the Bc group) (Priest et al. 2004 , while the remaining branches have an insertion/deletion rate . Fitting the maximum likelihood model to this scenario suggests that the rate is 4.42, while the rate is 0.35 (Fig. 3). It is striking that the rate is much greater than the rate and also much larger than 1, which is the evolutionary time period required to observe one substitution per nucleotide site. Hence, during the evolutionary time period required for one substitution per site, an entire gene could possibly have been inserted/deleted about five times in the Bc group. There are more gene movements observed relative to evolutionary branch length by comparing more closely related strains, and hence, there are more gene movements inferred at the tips of phylogeny.
As can be observed in Figure 2, there is a divergence in the genome size between the Bc group and the genome size of the other Bacillaceae. The branch leading to the Bc group was therefore separated from the other branches with a distinct rate
In addition to the above likelihood analysis, an analysis was performed with the assumption that genes cannot be regained after having been deleted. These results yield insertion/deletion rates that are similar to those under the assumption that genes can be regained after deletion (Table 2). However, in every case, the likelihood value is much lower. With a single constant insertion/deletion rate, the MLE is 0.48. In the case of two separate rates, the rate
Finally, the rate on internal branches in the Bc group was separated from that on external branches in the likelihood estimation (boxed portion of the phylogeny in Fig. 2). This shows that the insertion/deletion rate on external branches is higher than that on internal branches under both models of gene transfer (4.42 vs. 3.62 and 5.34 vs. 2.78, respectively) (Table 3). If the branch leading to the three B. anthracis strains is treated as an external branch in the likelihood estimation because of the close evolutionary relationship of the B. anthracis strains, the rate difference between external branches and internal branches becomes more dramatic (7.19 vs. 0.91 and 7.63 vs. 0.81, respectively) (Table 3). This again confirms that more gene insertions/deletions take place at the tips of the phylogeny.
To explore among the most closely related taxa, five strains from the Bc group; B. anthracis Ames (Ba1), B. anthracis Ames "ancestor" (Ba2), B. anthracis Sterne (Ba3), B. thuringiensis (Bt), and B. cereus ZK (Bc1) were analyzed separately. The comparison of homologs shows that >96% of the genes present in all five strains share at least 90% sequence identity with each other in their protein sequences. Therefore, the substitutions between homologs among these five strains should be considered as relatively limited. All phyletic patterns of these five strains are shown in Table 4. Of the 5076 gene families, there are only 3956 present in all five strains. Hence, 22.1% of the genes are not shared by all five strains, even though these five strains are believed to represent one species (Helgason et al. 2000
To determine the rates of evolution in the recently transferred genes, the tree lengths for the Bc-group-specific genes were measured (Fig. 4A) and compared with the tree lengths of genes present in all 13 strains (Fig. 4B). In both cases, only the branch lengths within the Bc group of taxa were measured. This comparison indicates that the genes that are strain specific within the Bc group have much faster rates of evolution than do more ancestral genes present in the other taxa.
The rates of nonsynonymous (Ka) and synonymous (Ks) substitutions were estimated for the genes present only within the Bc group and compared to genes that are more broadly distributed within the Bacillaceae group. Again, only changes that have occurred within the Bc group are measured with the genes categorized by their breadth of distribution. Both the Ks and Ka rates are elevated in the Bc group (Supplemental material), but the Ka values are most strongly affected. The Ka/Ks ratios for genes limited to the Bc group are shown in Figure 5A. It is clear that the Bc-group-specific genes have elevated Ka/Ks ratios. Genes present only within the Bc group have larger Ka/Ks ratios than genes present within the Bc group, Geobacillus kaustophilus, Bacillus licheniformis, and Bacillus subtilis or within the Bc group, G. kaustophilus, B. licheniformis, B. subtilis, Bacillus clausii, and Bacillus halodurans, or in all taxa (Fig. 5, A vs. B,C,D). The genes more recently transferred appear to contribute to a higher Ka/Ks ratio.
To determine the patterns of LGT, it is useful to examine closely related but fully sequenced genomes. A complete genome sequence is necessary to eliminate the possibility of a hidden paralog or of a genome rearrangement masking a homolog. Closely related taxa help to determine the number of genes that might have been laterally transferred. To this end, we have examined the gene content from 13 completely sequenced genomes from the Bacillaceae group.
The results demonstrate that LGT occurs rapidly and extensively between strains of the same species. A phylogeny was constructed to measure the rate of LGT relative to nucleotide substitutions. The concatenated DNA sequences of gmk, glpF, and pycA genes rather than ribosomal RNA sequences were used to reconstruct the phylogeny in this study. It is difficult to reconstruct the phylogenetic relationship within the Bc group owing to their remarkably similar rRNA sequences (Ash et al. 1991
When maximum likelihood estimates of the rates of insertion/deletion are mapped onto this phylogeny, it suggests that there are more genes coming in and going out at the tips of phylogeny. This is clear even if one looks at the table of gene presence/absence (Table 1) and observes that differences in gene content between taxa that are considered a single species are among the most common patterns observed. If this is an evolutionarily stable situation, then most of the laterally transferred genes must be lost shortly after their insertion during evolution. Genome annotation can be an error-prone task (Kyrpides and Ouzounis 1999
It has been suggested that B. anthracis, B. cereus, and B. thuringiensis are one species (Helgason et al. 2000
In the maximum likelihood estimation, the insertion rate is assumed to be equal to the deletion rate. This assumption was made to ensure that in the long term, genome sizes would not tend to zero or infinity. In the short term, this assumption is unlikely to be correct, and Thompson et al. (2005)
The more recently transferred genes have longer tree length (with P < 0.001 in a Wilcoxon rank test) (Fig. 4), suggesting that the recently transferred genes are evolving faster than ancient genes. The Ka/Ks ratio study suggests that, in general, more recently transferred genes have less functional constraints (Fig. 5). The different subfigures show the Ka/Ks ratio in genes with increasing depth in the phylogeny. Those genes that are inferred to have arisen recently via LGT in the phylogeny have a higher ratio than those that were transferred somewhat more distantly (Fig. 5, A vs. B is P
This study demonstrates that more recently transferred genes are under relaxed and faster evolution compared with the genes that have had a longer residence time. There are several possible reasons for this. It is possible that the laterally transferred genes with a higher rate are more prone to being laterally transferred. This is unlikely, as genes with a slightly longer residence time should not show the observed reduced rates of evolution. It has also been suggested that genes inserted into a new host will undergo amelioration of their sequence (Lawrence and Ochman 1997
Alternatively, the genes that have been recently transferred might be adapting to a new and local environment found in the new host. In this regard, it should be noted that several genes have a very large Ka/Ks ratio, suggesting directional selection. The recently transferred genes might also be evolving quickly as they are not required in their new hosts, offer minimal selective advantage, and could be in the process of being lost. This is in accord with the observation that genes come and go rapidly within closely related genomes. It is also in concordance with the very high tree lengths in Figure 4 that suggest genes change very rapidly. A recent study concluded that transferred genes are adaptive to specific environments (Pal et al. 2005
To gain a better concept of genome evolution in closely related bacteria, a group of bacteria with an abundance of completely sequenced congeneric species was selected. Thirteen complete Bacillaceae genome sequences were obtained from NCBI (http://www.ncbi.nlm.nih.gov/) to carry out the analysis. They are B. anthracis Ames, B. anthracis "Ames ancestor," B. anthracis Sterne, B. thuringiensis, B. cereus ZK, B. cereus ATCC 10,987, B. cereus ATCC 14,579, Geobacillus kaustophilus, B. licheniformis, B. subtilis, B. clausii, B. halodurans, and Oceanobacillus iheyensis. It has been argued that B. anthracis, B. cereus, and B. thuringiensis might be one species (Helgason et al. 2000
The evolutionary history of the Bc group has been reconstructed using the nucleotide sequences of the gmk, glpF, and pycA genes (Priest et al. 2004
The gene families present in the Bc group were used to conduct tree length and Ka/Ks ratio (
To evaluate the likelihood of the observed phyletic patterns, a simple model of gene evolution was chosen. This model assumes that individual genes are inserted or deleted at constant rates. This model does not include a consideration of increasing or decreasing numbers of genes but, rather, a constant number of gene places that may or may not be occupied at any one time. All events are assumed to be independent. Let
All observed patterns were used to calculate the overall likelihood at the last common ancestral node by multiplying individual likelihoods together. The overall likelihood for a total of n patterns will be
were assumed to be equal (µ = ) on each individual branch, but the overall rate of indels could vary among branches of the phylogeny. Hence, while the insertion and deletion rates were always equal, they could vary in magnitude on each branch. To estimate the maximum likelihood, the branch specific rates were optimized to find those rates that maximized the likelihood of observing the gene patterns (Table 1).
This work was supported by an NSERC grant to G.B.G. The authors wish to thank R. Morton for his suggestions on earlier versions of this manuscript and to thank the reviewers for their helpful suggestions.
1 Corresponding author.
E-mail Golding{at}McMaster.CA; fax (905) 522-6066. [Supplemental material is available online at www.genome.org.] Article is online at http://www.genome.org/cgi/doi/10.1101/gr.4746406.
Acinas S.G., Marcelino L.A., Klepac-Ceraj V., Polz M.F. 2004. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J. Bacteriol. 186: 26292635. Ash C., Farrow J.A., Dorsch M., Stackebrandt E., Collins M.D. 1991. Comparative analysis of Bacillus anthracis, Bacillus cereus, and related species on the basis of reverse transcriptase sequencing of 16S rRNA. Int. J. Syst. Bacteriol. 41: 343346.[CrossRef][Medline] Brunder W. and Karch H. 2000. Genome plasticity in Enterobacteriaceae. Int. J. Med. Microbiol. 290: 153165.[Medline] Cerdeno-Tarraga A.M., Patrick S., Crossman L.C., Blakely G., Abratt V., Lennard N., Poxton I., Duerden B., Harris B., Quail M.A.et al. 2005. Extensive DNA inversions in the B. fragilis genome control variable gene expression. Science 307: 14631465. Copley S.D. and Dhillon J.K. 2002. Lateral gene transfer and parallel evolution in the history of glutathione biosynthesis genes. Genome Biol. 3: 0025.10025.16. Daubin V. and Ochman H. 2004. Bacterial genomes as new gene homes: The genealogy of ORFans in E. coli.. Genome Res. 14: 10361042. Daubin V., Lerat E., Perriere G. 2003a. The source of laterally transferred genes in bacterial genomes. Genome Biol. 4: R57.[CrossRef][Medline] Daubin V., Moran N.A., Ochman H. 2003b. Phylogenetics and the cohesion of bacterial genomes. Science 301: 829832. Dean A.M., Neuhauser C., Grenier E., Golding G.B. 2002. The pattern of amino acid replacements in Felsenstein J. 1988. Phylogenies from molecular sequences: Inference and reliability. Annu. Rev. Genet. 22: 521565.[CrossRef][Medline] Felsenstein J. 1989. PHYLIP (phylogeny inference package). Version 3.2. Cladistics 5: 164166. Felsenstein J. 1992. Phylogenies from restriction sites: A maximum- likelihood approach. Evolution Int. J. Org. Evolution 46: 159173.[CrossRef] Felsenstein J. In Inferring phylogenies. . 2004. Sinauer Associates Inc., Sunderland, MA. Galtier N. and Boursot P. 2000. A new method for locating changes in a tree reveals distinct nucleotide polymorphism vs. divergence patterns in mouse mitochondrial control region. J. Mol. Evol. 50: 224231.[Medline] Garcia-Vallvé S., Romeu A., Palau J. 2000. Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 10: 17191725. Gu X. 2001. Maximum-likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 18: 453464. Gu X. and Zhang H. 2004. Genome phylogenetic analysis based on extended gene contents. Mol. Biol. Evol. 21: 14011408. Gu Z., Nicolae D., Lu H.H., Li W.H. 2002. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 18: 609613.[CrossRef][Medline] Hao W. and Golding G.B. 2004. Patterns of bacterial gene movement. Mol. Biol. Evol. 21: 12941307. Helgason E., Økstad O.A., Caugant D.A., Johansen H.A., Fouet A., Mock M., Hegna I., Kolstø A.-B. 2000. Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensisOne species on the basis of genetic evidence. Appl. Environ. Microbiol. 66: 26272630. Huelsenbeck J.P. and Ronquist F. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754755. Huson D.H. and Steel M. 2004. Phylogenetic trees based on gene content. Bioinformatics 20: 20442049. Itaya M. 1997. Physical map of the Bacillus subtilis 166 genome: Evidence for the inversion of an approximately 1900 kb continuous DNA segment, the translocation of an approximately 100 kb segment and the duplication of a 5 kb segment. Microbiology 143: 37233732.[Abstract] Klappenbach J.A., Saxman P.R., Cole J.R., Schmidt T.M. 2001. rrndb: The Ribosomal RNA operon copy Number Database. Nucleic Acids Res. 29: 181184. Kunin V. and Ouzounis C.A. 2003. The balance of driving forces during genome evolution in prokaryotes. Genome Res. 13: 15891594. Kurland C.G., Canback B., Berg O.G. 2003. Horizontal gene transfer: A critical view. Proc. Natl. Acad. Sci. 100: 96589662. Kuwahara T., Yamashita A., Hirakawa H., Nakayama H., Toh H., Okada N., Kuhara S., Hattori M., Hayashi T., Ohnishi Y.et al. 2004. Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions regulating cell surface adaptation. Proc. Natl. Acad. Sci. 101: 1491914924. Kyrpides N.C. and Ouzounis C.A. 1999. Whole-genome sequence annotation: Going wrong with confidence.. Mol. Microbiol. 32: 886887.[CrossRef][Medline] Lake J.A. and Rivera M.C. 2004. Deriving the genomic tree of life in the presence of horizontal gene transfer: Conditioned reconstruction. Mol. Biol. Evol. 21: 681690. Lawrence J.G. and Ochman H. 1997. Amelioration of bacterial genomes: Rates of change and exchange. J. Mol. Evol. 44: 383397.[CrossRef][Medline] Liu G.R., Rahn A., Liu W.Q., Sanderson K.E., Johnston R.N., Liu S.L. 2002. The evolving genome of Salmonella enterica serovar Pullorum. J. Bacteriol. 184: 26262633. Lynch M. and Conery J.S. 2003. The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3: 3544.[CrossRef][Medline] McLysaght A., Baldi P.F., Gaut B.S. 2003. Extensive gene gain associated with adaptive evolution of poxviruses. Proc. Natl. Acad. Sci. 100: 1565515660. Mirkin B.G., Fenner T.I., Galperin M.Y., Koonin E.V. 2003. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3: 2.[CrossRef][Medline] Molecular studies of evolution: A source of novel statistical problems. In (eds. S.S. Gupta and J. Yackel) pp. 127.Neyman J. In Statistical decision theory and related topics . 1971. Academic Press, New York. Ochman H. and Jones I.B. 2000. Evolutionary dynamics of full genome content in Escherichia coli.. EMBO J. 19: 66376643.[CrossRef][Medline] Pal C., Papp B., Lercher M.J. 2005. Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat. Genet. 37: 13721375.[CrossRef][Medline] Priest F.G., Barker M., Baillie L.W., Holmes E.C., Maiden M.C. 2004. Population structure and evolution of the Bacillus cereus group. J. Bacteriol. 186: 79597970. Siew N. and Fischer D. 2003. Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins 53: 241251.[CrossRef][Medline] Siew N. and Fischer D. 2004. Structural biology sheds light on the puzzle of genomic ORFans. J. Mol. Biol. 342: 369373.[CrossRef][Medline] Silva F.J., Latorre A., Moya A. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends Genet. 19: 176180.[CrossRef][Medline] Snel B., Bork P., Huynen M.A. 1999. Genome phylogeny based on gene content. Nat. Genet. 21: 108110.[CrossRef][Medline] Snel B., Bork P., Huynen M.A. 2002. Genomes in flux: The evolution of archaeal and proteobacterial gene content. Genome Res. 12: 1725. Stoebel D.M. 2005. Lack of evidence for horizontal transfer of the lac operon into Escherichia coli.. Mol. Biol. Evol. 22: 683690. Taoka M., Yamauchi Y., Shinkawa T., Kaji H., Motohashi W., Nakayama H., Takahashi N., Isobe T. 2004. Only a small subset of the horizontally transferred chromosomal genes in Escherichia coli are translated into proteins. Mol. Cell. Proteomics 3: 780787. Thompson J.D., Higgins D.G., Gibson T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680. Thompson J.R., Pacocha S., Pharino C., Klepac-Ceraj V., Hunt D.E., Benoit J., Sarma-Rupavtarm R., Distel D.L., Polz M.F. 2005. Genotypic diversity within a natural coastal bacterioplankton population. Science 307: 13111313. Tillier E.R. and Collins R.A. 2000. Genome rearrangement by replication-directed translocation. Nat. Genet. 26: 195197.[CrossRef][Medline] Ullrich S., Kube M., Schubbe S., Reinhardt R., Schuler D. 2005. A hypervariable 130-kilobase genomic region of Magnetospirillum gryphiswaldense comprises a magnetosome island which undergoes frequent rearrangements during stationary growth. J. Bacteriol. 187: 71767184. Yang Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555556. Zhang R. and Zhang C.T. 2003. Identification of genomic islands in the genome of Bacillus cereus by comparative analysis with Bacillus anthracis.. Physiol. Genomics 16: 1923. Zhang P., Gu Z., Li W.H. 2003. Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol. 4: R56.[CrossRef][Medline]
Received September 28, 2005; accepted in revised format February 21, 2006. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||