Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment

Morteza Roodgar; Benjamin H. Good; Nandita R. Garud; Stephen Martis; Mohan Avula; Wenyu Zhou; Samuel M. Lancaster; Hayan Lee; Afshin Babveyh; Sophia Nesamoney; Katherine S. Pollard; Michael P. Snyder

doi:10.1101/gr.265058.120

Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment

¹Department of Genetics, Stanford University, Stanford, California 94305, USA;
²Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA;
³Department of Applied Physics, Stanford University, Stanford, California 94305, USA;
⁴Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California 90095, USA;
⁵Department of Physics, University of California, Berkeley, California 94720, USA;
⁶Gladstone Institutes, San Francisco, California 94158, USA;
⁷Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94158, USA;
⁸Chan Zuckerberg Biohub, San Francisco, California 94158, USA

↵9 These authors contributed equally to this work.

Corresponding authors: bhgood{at}stanford.edu, katherine.pollard{at}gladstone.ucsf.edu, mpsnyder{at}stanford.edu

Next Section

Abstract

Gut microbial communities can respond to antibiotic perturbations by rapidly altering their taxonomic and functional composition. However, little is known about the strain-level processes that drive this collective response. Here, we characterize the gut microbiome of a single individual at high temporal and genetic resolution through a period of health, disease, antibiotic treatment, and recovery. We used deep, linked-read metagenomic sequencing to track the longitudinal trajectories of thousands of single nucleotide variants within 36 species, which allowed us to contrast these genetic dynamics with the ecological fluctuations at the species level. We found that antibiotics can drive rapid shifts in the genetic composition of individual species, often involving incomplete genome-wide sweeps of pre-existing variants. These genetic changes were frequently observed in species without obvious changes in species abundance, emphasizing the importance of monitoring diversity below the species level. We also found that many sweeping variants quickly reverted to their baseline levels once antibiotic treatment had concluded, demonstrating that the ecological resilience of the microbiota can sometimes extend all the way down to the genetic level. Our results provide new insights into the population genetic forces that shape individual microbiomes on therapeutically relevant timescales, with potential implications for personalized health and disease.

The composition of the gut microbiome varies among human populations and individuals, and it is thought to play a key role in maintaining health and reducing susceptibility to different diseases (Gill et al. 2006; Feng et al. 2015; Sharon et al. 2016; Halfvarson et al. 2017). Understanding how this microbial ecosystem changes from week to week—through periods of health, disease, and treatment—is important for personalized health management and design of microbiome-aware therapies (Spanogiannopoulos et al. 2016).

Many studies have investigated intra-host dynamics at the species or pathway level (Jernberg et al. 2010; Dethlefsen and Relman 2011; Keeney et al. 2014; Buffie et al. 2015; Yin et al. 2015; Zaura et al. 2015; Raymond et al. 2016; Yassour et al. 2016; Lloyd-Price et al. 2017; Palleja et al. 2018; Ng et al. 2019). Among other findings, these studies have shown that oral antibiotics can strongly influence the composition of the gut microbiome over a period of days, whereas the community often regains much of its initial composition in the weeks or months after antibiotics are removed (Dethlefsen and Relman 2011; Buffie et al. 2015; Ng et al. 2019). This suggests an intriguing hypothesis, in which the long-term composition of a healthy gut community is buffered against brief environmental perturbations.

However, the mechanisms that enable this ecological resilience are still poorly understood. Does species composition recover because external strains are able to recolonize the host? Or do resident strains persist in refugia and expand again once antibiotics are removed? In the latter case, do resident populations also acquire genetic differences during this time, either due to population bottlenecks or to new selection pressures that are revealed during treatment? These questions can be addressed by quantifying fine-scale microbiome genetic variation below the species or pathway level and tracking how it changes during periods of health, disease, and treatment.

Recent advances in strain-resolved metagenomics and isolate sequencing (Scholz et al. 2016; Ward et al. 2016; Truong et al. 2017) have made it possible to detect DNA sequence variants within species and to track how they change within and between hosts. These studies have shown that gut bacteria can acquire genetic differences over time even in healthy human hosts and that these differences arise from a mixture of external replacement events (Schloissnig et al. 2013; Truong et al. 2017; Garud et al. 2019) and the evolution of resident strains (Ghalayini et al. 2018; Garud et al. 2019; Zhao et al. 2019). However, because these previous studies have included relatively few time points per host, or relatively shallow sampling of their microbiota, the population genetic processes that drive these strain-level dynamics remain poorly characterized. Understanding how the forces of mutation, recombination, selection, and genetic drift operate within hosts is critical for efforts to forecast personalized responses to drugs or other therapies.

To bridge this gap, we used deep metagenomic sequencing to follow the genetic diversity within a single host microbiome at approximately weekly intervals over a period of 5 mo, which included periods of infectious disease and the oral administration of broad-spectrum antibiotics. We used a linked-read sequencing technique to generate each of our metagenomic samples: large molecules of bacterial DNA were isolated in millions of emulsified droplets, digested into shorter fragments, and labeled with a corresponding DNA barcode to follow linked reads from the same droplet. Previous work has shown that the linkage information encoded in these barcoded “read clouds” can improve genome assembly (Bishara et al. 2018) and taxonomic assignment (Danko et al. 2019) in human gut metagenomes. Here, we took a different approach and developed new statistical methods that leverage longitudinal linked-read sequencing to detect and interpret fine-scale genetic changes that take place within the resident populations of individual bacterial species over time. This reference-based strategy simultaneously captures the ecological and evolutionary dynamics of multiple strains in many resident species, without requiring assembly of complete genomes.

Here, we sought to use this approach to characterize the population genetic forces that shape native gut microbiota through periods of health, disease, antibiotic treatment, and recovery. By analyzing the temporal dynamics of thousands of single nucleotide polymorphisms in 36 abundant species, we obtain new insights into the strain-level mechanisms that govern the ecological resiliency of this community, which have important potential implications for personalized health and disease.

Previous Section Next Section

Results

Longitudinal linked-read sequencing of a human gut microbiome during disease and treatment

Generation of linked reads requires the preparation of long DNA fragments. We therefore developed an optimized protocol for extracting high-molecular weight (HMW) DNA from human stool samples (Methods). We used this approach to perform linked-read sequencing (10x Genomics) on 19 stool samples collected from a single human subject over a period of 5 mo (Fig. 1A; Supplemental Table S1). During this time, the individual was diagnosed with human rhinovirus (HRV) and Lyme disease and received a 2-wk course of broad-spectrum oral antibiotics (doxycycline). We generated deep sequencing data for each sample (ranging from ∼8–160 Gbp), so that a typical read was present in a “read cloud” containing ∼4–30 other read pairs (Fig. 1D,E; Supplemental Fig. S1). Taxonomic profiling showed that the subject's gut microbial community contained 66 bacterial species with median relative abundance >10⁻⁵, all but one of which exceeded 0.1% frequency in at least one time point (Fig. 1B; Methods). Consistent with previous studies (Danko et al. 2019), we found that the extensive taxonomic diversity of this community was accompanied by high levels of read cloud “impurity,” in which individual read clouds were frequently observed to contain fragments from ∼5–10 different species (Fig. 1E). The presence of such impurities makes it difficult to directly infer genetic linkage from individual read clouds in complex metagenomic samples. Here, we sought to overcome this issue by employing a two-stage approach, which leverages the hybrid nature of the linked-read sequencing protocol. We first ignored long-range linkage and used short-read, reference-based methods to track species and subspecies diversity over time (Methods) (Nayfach et al. 2016; Garud et al. 2019). We then developed a statistical model for linking genomic regions with higher-than-expected read cloud sharing given the observed levels of read cloud impurity (Methods). Using this hybrid approach, we documented the ecological and evolutionary responses of the gut microbial community before, during, and after antibiotic treatment.

View larger version:

Download as PowerPoint Slide

Figure 1.

Read cloud sequencing of the gut microbiome of a single individual during disease, antibiotic treatment, and recovery. (A) Study design. Linked-read metagenomic sequencing was performed on 19 fecal samples collected from a single individual over a period of 5 mo. During this time, the individual was diagnosed with human rhinovirus (HRV) and Lyme disease and received an oral course of doxycycline. Samples are colored according to the epochs defined in Supplemental Table S1. (B) Rank relative abundance distribution at the species level, estimated from shotgun metagenomic reads (Methods). Colored lines show distributions obtained from individual time points, using the same color scheme as panel A. Solid and dashed black lines denote median and maximum relative abundances across all time points, respectively. (C) Species-level composition over time. Top panel illustrates Jensen-Shannon distance to each of four baseline samples as a function of time. Bottom panel shows relative abundance trajectories for a subset of the most abundant species; others are grouped together into the “other” category. (D) Schematic of linked-read sequencing with the 10x Genomics platform. High molecular weight metagenomic DNA is partitioned into millions of microfluidic droplets. Amplification and ligation reactions are performed within each droplet, yielding millions of short-read libraries that are tagged with droplet-specific DNA barcodes. The resulting “read clouds” are then pooled together and sequenced on an Illumina instrument. (E) Observed statistics of read clouds from the first three time points. The top panel shows total number of read pairs contributed by read clouds as a function of the number of read pairs they contain. The bottom panel shows a measure of the effective number of species that are detected in each read cloud as a function of the number of read pairs it contains (Methods). Many read clouds contain fragments from several different DNA molecules.

We first examined the ecological responses at the species level. Consistent with previous work (Schloissnig et al. 2013; Lewis et al. 2015; Luo et al. 2015; Kuleshov et al. 2016; Nayfach et al. 2016; Relman and Lipsitch 2018), we observed a substantial perturbation in species-level composition during and immediately after antibiotic treatment, followed by a return to near baseline values by the end of the study (Fig. 1C). However, we found that only a few species underwent large declines in abundance during the treatment period, and even fewer showed signs of going extinct during this interval. We used microbial DNA quantification (microbial DNA per mass of sample) (Contijoch et al. 2019) to convert our relative abundance measurements into absolute microbial densities for a subset of the study time points (Supplemental Fig. S2). Of the 50 species with baseline abundance >0.1%, we found that only 12 declined by more than 10-fold at the end of the 2-wk treatment window, and all but four of these recovered to near baseline values by the end of the study (Supplemental Fig. S3). Alistipes finegoldii and Butyrivibrio crossotus provide two prototypical examples of this effect: both experienced large declines in abundance during antibiotic treatment but eventually recovered to their initial levels over the next ∼3–5 wk (Fig. 2A). The small number of such examples, combined with the relatively steady fecal DNA content, suggests that a large fraction of the initial community was able to maintain high absolute abundance during antibiotic treatment, for example, due to reduced sensitivity to doxycycline. Consistent with this hypothesis, we observed a high baseline proportion of doxycycline-related resistance genes among our metagenomic reads (∼200 reads per million mapped), which increased ∼twofold during treatment (Supplemental Fig. S4). This finding is also consistent with previous observations of tetracycline resistance in isolates of several Bacteroides species (Rasmussen et al. 1993).

View larger version:

Download as PowerPoint Slide

Figure 2.

Varied ecological and genetic responses across 36 abundant species in the same host. (A) Relative abundances of species through time, partitioned according to the epochs defined in Figure 1A. Each time point is indicated by a point, and the time points from the same epoch are connected by a vertical line to aid in visualization. For comparison, the gray distribution shows the corresponding values across a larger human cohort (Methods). Species whose relative abundance drops by more than 10-fold between baseline and antibiotic time points are indicated with a single star. Only a minority of the most abundant species experience such reductions in relative abundance during treatment. (B) Within-species nucleotide diversity for each time point, as measured by the fraction of core genome sites with intermediate allele frequencies (0.2 < f< 0.8) (Methods). Points are plotted using the same color scheme as panel A. (C) The total number of single nucleotide (SNV) differences between a baseline time point and each of the later epochs (Methods). The height of the white area indicates the total number of polymorphic SNVs that were tested for temporal variation. Different species display a range of different behaviors, which can be partitioned into putative cases of competition between distantly related strains (left of vertical divider line) and evolution within a dominant resident strain (right). (D) Initial frequencies of alleles identified in panel C. For species with more than 10 SNV differences, the data are summarized by the median initial frequency (square symbol) and the interquartile range (line). Many alleles have nonzero frequency before the sweep occurs. (E) Fraction of SNV differences in panel C that are retained at the final time point (f > 0.7). In many species, only a minority of SNV differences gained during disease or treatment are retained.

Deep longitudinal sequencing reveals shifts in the genetic composition of 36 species in the same host

The general pattern of persistence and recovery at the species level is shared by many other classes of antibiotics (Relman and Lipsitch 2018). However, the strain-level dynamics that give rise to this long-term stability remain poorly understood. Do the species that persist through disease and treatment remain stable genetically? Or does this general pattern of robustness mask a larger flux of genetic changes occurring within individual species? Our deep sequencing approach allows us to address these questions by tracking genetic variation within species over time.

We tracked the genetic composition of each species by aligning our short sequencing reads to a panel of reference genomes and estimating the population frequency of single nucleotide variants (SNVs) that we detected at each time point (Methods). Our high sequencing coverage enabled at least ∼10-fold coverage per time point for species with relative abundance >0.3% and coverages as high as ∼500-fold in some of the most abundant species (Supplemental Fig. S5). This allowed us to simultaneously monitor SNV trajectories within 36 different species that passed our minimum coverage thresholds (Methods) and to explore how these “evolutionary” dynamics either mirrored or contrasted with the dynamics of their species abundance trajectories above (Figs. 2, 3; Supplemental Figs. S6–S8; Supplemental Data S1–S3).

View larger version:

Download as PowerPoint Slide

Figure 3.

Ecological and genetic dynamics in six example species. (A–F) A subset of the species in Figure 2 were chosen to illustrate a range of characteristic behaviors. For each of the six species, the top panel shows the relative abundance of that species over time, whereas the bottom panel shows the frequencies of single nucleotide variants (SNVs) within that species. Colored lines indicate SNVs that underwent a significant shift in frequency over time (Methods), whereas a subset of nonsignificant SNVs are shown in light gray for comparison. The colors of temporally varying SNVs are assigned by a hierarchical clustering scheme, which is also used to determine their polarization (Methods). The shaded region denotes the antibiotic treatment period depicted in Figure 1A.

This strain-level analysis revealed large differences in the genetic composition of different species. Consistent with previous work (Schloissnig et al. 2013; Kuleshov et al. 2016; Truong et al. 2017; Garud et al. 2019), we found that the initial levels of genetic diversity varied widely between species. Some common species, such as Bacteroides vulgatus and Bacteroides uniformis, harbored more than ∼10,000 SNVs at intermediate frequencies, whereas other species, for example, Bacteroides coprocola or Alistipes sp, had fewer than ∼100 detectable SNVs (Fig. 2B). Of particular interest are those SNVs that underwent large changes in frequency between the initial and later time points (e.g., from <20% to >70%, with FDR < 0.1) (see Methods); these “SNV differences” indicate a nearly complete “sweep” within the species of interest. We observed a similarly wide range in the number of SNV differences during and immediately after antibiotic treatment, from more than ∼10,000 in some species (e.g., Eubacterium eligens) to ∼10 (or even 0) in others (Fig. 2C). Of the 36 populations in Figure 2C, more than half accumulated at least one SNV difference during this period, and more than 80% accumulated SNV differences in at least one portion of the study.

Similar within-host changes have recently been observed in metagenomic and isolate-based analyses from healthy hosts (Ghalayini et al. 2018; Garud et al. 2019; Zhao et al. 2019), though at a significantly lower rate (Fisher's exact test, P < 0.001). A major challenge in these earlier metagenomic studies has been to demonstrate that the temporally variable SNVs are truly linked to their inferred genomic background, as opposed to read mapping artifacts (e.g., from another temporally fluctuating species that happens to share some part of the genome). Linked-read sequencing provides an opportunity to address this question. For each SNV difference reported in Figure 2C, we examined the patterns of barcode sharing with genes in the “core” genomes of our reference genome panel, which provides a proxy for the true genomic background (Methods). This analysis yielded positive confirmation for ∼80% of the SNV differences in Figure 2C (where both alleles share read clouds with a core gene in the target species) and negative confirmation for <1% (Supplemental Fig. S7). We conclude that the majority of these SNVs represent true genetic changes within their respective species populations.

The variable genetic responses in different species are not easily explained by their phylogenetic relatedness or their relative abundance trajectories. As an example, Figure 3 shows the full species abundance and SNV frequency trajectories for six example species, which were chosen to illustrate the range of observed behaviors. This set includes three different members of the Alistipes genus that coexisted within this particular host (Fig. 3A–C). The first two species, A. sp and A. finegoldii, experienced large reductions in relative abundance during treatment, but we observed genetic differences in only one of the populations (A. finegoldii) when they recovered to their initial relative abundance levels. In A. onderdonkii, in contrast, the relative abundance remained high at the end of the treatment phase, but we observed rapid changes in the frequencies of several SNVs within this species during the same time period (P < 0.001) (Methods). These examples show that species abundances alone are not sufficient to predict genetic response within species: relatively constant species abundance trajectories can mask larger genetic shifts within a species, and vice versa.

Quantifying genetic linkage between SNVs using barcoded read clouds

We next sought to quantify the population genetic processes that could give rise to the SNV changes observed in Figures 2 and 3. A key question is the extent of genetic linkage within species: is recombination sufficiently frequent that genetic drift and natural selection act independently on different SNVs? Or are SNV trajectories tightly correlated because they are linked together on a small number of clonal backgrounds? This question is particularly relevant for species with high levels of SNV diversity like B. vulgatus (Supplemental Fig. S10), where it can mean the difference between ∼10,000 evolutionary trajectories (if SNVs are independent) or possibly only one (if SNVs derive from a mixture of two clonal strains).

Previous analyses of gut bacteria suggest that recombination can efficiently decouple SNVs over long timescales (i.e., millions of bacterial generations) (Vos and Didelot 2009; Garud et al. 2019), but the extent of genetic linkage within hosts remains unclear. The additional information provided by linked-read sequencing now allows us to investigate this question. We developed a statistical approach for detecting linkage between pairs of SNVs (Fig. 4A), which accounts for the substantial coverage variation across different species and read clouds (Methods). Figure 4B shows how the overall levels of read cloud sharing depend on the coordinate distance between the two SNVs on the reference genome. Consistent with the fragment length estimates from our HMW DNA extraction protocol (Supplemental Fig. S11), we observed an enrichment of shared read clouds barcodes for SNVs within ∼10 kb of each other, though the overall fraction of long-range read clouds remains modest (∼10%). For the subset of SNVs pairs with significant read cloud sharing, we further quantified genetic linkage by examining the combinations of major and minor alleles that are observed in the same read clouds. In particular, we estimated the number of allelic combinations (or haplotypes) that were observed for each pair of SNVs as a function of their coordinate distance along the reference genome (Fig. 4A). According to the four-gamete test (Hudson and Kaplan 1985), three or fewer haplotypes are consistent with clonal evolution, but the presence of all four haplotypes indicates a possible recombination event between the two SNVs (Methods). Figure 4C shows that the vast majority of the SNV pairs we observed across species were consistent with clonal evolution: of the ∼4 million SNV pairs we examined that were separated by more than 2 kb, only ∼600 showed significant evidence for all four haplotypes (q < 0.05) (Methods). Most of these four-haplotype pairs were concentrated in just a few species, with high values of linkage disequilibrium between the two SNVs (Supplemental Fig. S12). This suggests that, to a first approximation, the SNV dynamics within species in this time course reflect a competition between a few clonal haplotypes, rather than independent alleles. This is consistent with previous indirect evidence from the clustering of allele frequencies within hosts (Luo et al. 2015; Smillie et al. 2018; Garud et al. 2019).

View larger version:

Download as PowerPoint Slide

Figure 4.

High levels of linkage disequilibrium (LD) in many resident populations. (A) Schematic of read cloud sharing between two SNVs separated by coordinate distance ℓ on the same reference contig. Three or fewer haplotypes are consistent with clonal evolution, whereas four haplotypes indicate a possible recombination event. (B) Observed fraction of shared read clouds as a function of ℓ for SNVs in the six example species in Figure 3. (C) Linkage disequilibrium between pairs of SNVs across a range of different species. The top panel shows the total number of linked SNV pairs (i.e., those with significantly elevated levels of read cloud sharing) for species in Figure 2 with sufficient coverage (Methods). For each species, the three bars denote SNV pairs with ℓ < 200 bp, 200 bp < ℓ < 2 kb, and ℓ > 2 kb, respectively. SNVs are included only if the minor allele has frequency f > 0.1. The bottom panel shows the observed proportion of SNV pairs in the top panel that fall in each of the LD categories illustrated in panel A. Across species, only a small fraction of SNV pairs provide evidence for recombination.

For populations with sufficiently high SNV densities (>1 per kb), the patterns of read cloud sharing can inform efforts to cluster SNVs into smaller numbers of competing haplotypes (Supplemental Fig. S10; Supplemental Data S7). However, we found that many temporal changes occurred in populations with much lower SNV densities (<1 per 10 kb) (Fig. 2). In these populations, Figure 4B suggests that SNVs will not typically share read clouds, except in rare cases where they happen to be located in nearby regions of the genome. We therefore used a heuristic approach to infer clusters of perfectly linked SNVs (a form of multi-SNV haplotype) based on similarities in their allele frequency trajectories (Methods). The inferred haplotypes are indicated by the coloring scheme in Figure 3.

Temporal dynamics of haplotypes reveal cryptic phenotypic differences and time-varying selection within species

We next investigated the role that natural selection plays in driving the genetic changes we observed within species. Although adaptive evolution is ubiquitous in microbial evolution experiments (Cvijović et al. 2018) and many pathogens (Didelot et al. 2016; Lauring 2020), its influence on natural genetic variation in the human gut microbiota is less well understood. One common model assumes that within-host dynamics are dominated by selection at higher levels of taxonomic organization (e.g., species, families, or functional guilds), whereas closely related variants that survive this environmental filter are largely interchangeable. In this “conditionally neutral” scenario, short-term changes in the genetic composition of a resident population primarily occur through stochastic demographic processes like genetic drift (Kimura 1983; Hubbell 2001) or genetic draft (Neher 2013). Empirical support for this model is currently mixed: one recent study in Bacteroides fragilis found signatures of nonneutral evolution (Zhao et al. 2019), whereas a second study in Escherichia coli came to the opposite conclusion, potentially due to a smaller effective population size in E. coli (Ghalayini et al. 2018). At present, it is not clear which of these scenarios will apply to the other species in Figures 2 and 3, or how they generalize the stronger environmental perturbations associated with antibiotic treatment. These sudden environmental shifts could uncover cryptic phenotypic differences between previously coexisting strains, driving rapid shifts in the frequencies of segregating genetic variants (Paaby and Rockman 2014).

High levels of genetic linkage make it difficult to distinguish these scenarios using traditional population genetic approaches, because selection and drift will typically act on extended haplotypes rather than individual alleles. For example, the sweeping SNV clusters we identified in Figure 3 contain many synonymous variants (Fig. 5A–F), which are most likely hitchhiking alongside causative “driver” mutations that are located in other linked regions of the genome. These driver mutations may not even be visible in Figure 3 if they happen to arise from structural variants, mobile elements, or other accessory genes that are difficult to detect in our metagenomic pipeline (Jiang et al. 2019; Zhao et al. 2019; Yaffe and Relman 2020; Zlitni et al. 2020).

View larger version:

Download as PowerPoint Slide

Figure 5.

Signatures of strain replacement and evolutionary modification. (A–F) Statistical properties of temporally varying SNVs from the six example species in Figure 3. For each species, the bars on the left show the relative proportion of SNVs with different protein-coding effects and allele prevalence across other hosts in a larger cohort (Methods). Protein-coding effects are estimated from the codon degeneracy at each site (4D = fourfold degenerate/synonymous, 1D = onefold degenerate/nonsynonymous). Allele prevalences for SNVs not observed in other hosts are indicated by light red or blue shading. Pie charts indicate the relative proportion of private marker SNVs for each species that are preserved or disrupted throughout the sampling interval (Methods). Large fractions of disrupted marker SNVs indicate a strain replacement event.

To overcome these limitations, we sought to develop a methodology for distinguishing natural selection and demographic stochasticity using the residual information encoded in the shapes of the SNV trajectories in Figure 3. We first developed a statistical test to determine whether the SNV clusters in Figure 3 were consistent with the simplest neutral null model, with a constant but unknown strength of genetic drift (Methods). This test leverages the fact that the frequency changes produced by a constant rate of genetic drift must be statistically similar over time, so that a large fluctuation in one time-window is unlikely to be followed by a small change in another. Our observed trajectories often violate this prediction, allowing us to reject the constant genetic drift model for four of the five species in Figure 3 (Supplemental Table S3).

A second possibility is that genetic drift is significantly elevated during antibiotic treatment, for example, due to a transient population bottleneck. This could be a particularly plausible hypothesis for the Alistipes finegoldii population in Figure 3A, where the genetic shift coincided with a large reduction in the relative abundance of that species. Although it is difficult to rule out similar bottlenecks at unobserved time points for the other species in Figure 3, we still observe significant departures from the constant genetic drift model for these species even when the antibiotic time points are excluded (Supplemental Table S3). Closer inspection of the frequency trajectories reveals the likely source of this signal: many of the SNV clusters continue to change in frequency but in the opposite direction, even after antibiotic treatment has concluded. This behavior, which is recapitulated across the larger set of species in Figure 2E, cannot be explained by a simple bottleneck during treatment. Instead, these data suggest the initial increases and later reversals are most likely caused by time-varying selection pressures that act on different haplotypes within these populations, reflecting the complex environmental conditions experienced by these species in vivo.

The high temporal resolution of the SNV clusters allows us to estimate the magnitudes of these temporally varying selection pressures in different time windows. For example, the frequency reversals that occurred after treatment in Figure 3 took place over ∼30–40 d, implying a fitness difference about S∼10% per day (Methods). The increases in frequency during and immediately after antibiotics were even more rapid. In Eubacterium eligens, the minority haplotype increased from 7% to 90% in just 2 d, implying a corresponding fitness difference of at least S∼250% per day. The presence of these large fitness differences is potentially not surprising, given the strong environmental perturbations that are presumably imposed by antibiotic treatment. In this case, however, we found that the large frequency shift in E. eligens did not take place during the 2-wk course of doxycycline but rather 3–5 d after treatment had concluded. (For comparison, we note that serum concentrations of doxycycline usually reach their peak value after ∼4 h, with a half-life of 12–24 h [Agwuh and MacGowan 2006]). We also found that the E. eligens population exhibited a ∼20% increase in its replication origin peak-to-trough ratio (PTR), a proxy for bacterial growth rate, during this same time interval (Supplemental Fig. S13). This indicates that the large fitness differences within this species were likely driven by a higher growth rate of the sweeping haplotype, rather than an increased death rate of the declining strain.

Statistics of sweeping SNVs reveal strain replacement, evolutionary modification, and selection on standing variation

After demonstrating that genetic changes occur within species and that these changes are likely driven by selection on linked haplotypes and not necessarily associated with changes in species abundance, we next sought to investigate the origin of these within-host sweeps. A key question is whether the temporally variable SNVs arose within the host or its local environment (evolutionary modification) or whether they reflect the invasion of pre-existing strains that diverged for many generations before colonizing the host (strain replacement). Following previous work (Garud et al. 2019), we distinguished between these two scenarios by examining three additional features of the SNV trajectories in Figure 3: (1) the protein-coding impact of these mutations; (2) their prevalence across other hosts in a large reference panel; and (3) the retention of private marker SNVs (i.e., high-frequency alleles that are unique to the present host) (Supplemental Fig. S14; Methods). Figure 5, A–F, illustrates these quantities for the six example species in Figure 3, which were chosen to cover the range of different behaviors that we observed.

The Eubacterium eligens population provides a prototypical example of a strain replacement event. The sweeping haplotype in this species contained more than 10,000 SNVs that were widely distributed across the genome (Supplemental Data S1), consistent with the typical genetic differences between E. eligens strains in different hosts (Schloissnig et al. 2013; Truong et al. 2017; Garud et al. 2019). Few private marker SNVs were retained from the initial time point (Fig. 5D), which is again consistent with replacement by a distantly related strain (Supplemental Fig. S14). Similar examples of strain replacement have been observed in previous studies (Yassour et al. 2016; Lloyd-Price et al. 2017; Truong et al. 2017; Garud et al. 2019; Yaffe and Relman 2020), but our densely sampled time course now provides additional information about the dynamics of this process. In particular, the SNV trajectories in Figure 3D show that the distantly related strain was already present at substantial frequencies (∼5%) long before its large fitness benefit started to take effect. Figure 2D shows that this is also the case for the four other putative replacement events in Figure 2. This indicates that the replacement events we observed here were caused by the sudden increase of previously colonizing strains, rather than the contemporary invasion of new strains from outside the host.

At the opposite extreme, the Phascolarctobacterium population in Figure 3E provides a prototypical illustration of an evolutionary modification event. In this case, a cluster of just six SNVs (including five amino acid changes, all in noncontiguous genes in the reference genome that are unlinked in our read clouds) (Supplemental Data S1, S4) nearly swept to fixation during antibiotic treatment (f > 99.8%, S > 30% per day), only to decline in frequency later in the study. Unlike the replacement event above, this sweep shared all 34 of the private marker SNVs from the dominant strain at the initial time point (Figure 5E), suggesting that the two strains recently descended from a common ancestor. However, similar to this previous example, we again observe evidence that the minority haplotype was already segregating at substantial frequencies (∼1%–10%) before treatment, a finding which is recapitulated for several other nonreplacement examples in Figure 2D. This suggests that frequency-dependent selection may have initially driven these mutant lineages to intermediate frequencies—and maintained them there—before antibiotics or other environmental changes (or subsequent mutations) caused them to sweep through the rest of the population.

In addition to these extreme cases, we also observed a third category of events that seem to bridge the divide between strain replacement and evolutionary modification. For example, in the Alistipes finegoldii population in Figures 3A and 5A, a cluster of ∼80 SNVs swept to high frequency when the species recovered from antibiotic treatment, potentially consistent with a population bottleneck. Whereas the high rates of private marker SNVs sharing (49/51) suggest that the sweeping haplotype is a modification of the dominant strain from the initial time point, the large fraction of synonymous mutations (d_N/d_S = 0.16), many of which are shared across other hosts, is more consistent with a strain replacement event. Moreover, in contrast to the two examples above, we found that the A. finegoldii SNVs fell into a smaller number of contiguous genes in the reference genome and were often linked together in the same read clouds (Supplemental Data S1, S4). These same SNVs are also frequently co-inherited in “haplotype blocks” among the other hosts in our larger reference panel (Supplemental Fig. S15). Taken together, these independent lines of evidence suggest that the A. finegoldii SNVs in Figure 3A were most likely transferred onto their current genetic background through recombination. Similar to the E. eligens and Phascolarctobacterium examples above, the sweeping haplotype in A. finegoldii was already segregating as a minor variant (f∼20%) before antibiotic treatment, suggesting that the original recombination event (and its initial rise to observable frequencies) predated the current sampling period. In fact, the same cluster of SNVs can be observed in a previously sequenced sample from the same subject that was collected roughly 2 yr earlier (Supplemental Fig. S6; Kuleshov et al. 2016), suggesting that the recombinant haplotype is at least several years old.

The Bacteroides coprocola population provides another example of this intermediate behavior (Figs. 3F, 5F). In this case, a cluster of 37 SNVs (including reversions of 11 of the 161 private marker SNVs) was already in the process of sweeping through the population before antibiotic treatment began. However, the sweeping mutations in this example are scattered across many noncontiguous genes in the B. coprocola reference genome and are seldom observed in other hosts (Supplemental Data S1), so recombination no longer provides a parsimonious explanation. The fraction of synonymous mutations (d_N/d_S = 0.7) also lies somewhere between the typical between-host values (d_N/d_S∼0.1) and within-host hitchhiking (d_N/d_S ≥ 1). This suggests that the lineages may have coexisted with each other for a much longer period of time.

Previous Section Next Section

Discussion

The response of gut microbial communities to antibiotics plays a crucial role in their susceptibility to pathogens (Buffie et al. 2015; Cameron and Sperandio 2015; Sassone-Corsi and Raffatellu 2015), the spread of antibiotic resistance genes (Hu et al. 2013; Kintses et al. 2019), and their long-term stability (Dethlefsen and Relman 2011; Kamada et al. 2012; Faith et al. 2013). Numerous studies have documented the resilience of these communities at the taxonomic or pathway level (Jernberg et al. 2010; Dethlefsen and Relman 2011; Keeney et al. 2014; Buffie et al. 2015; Yin et al. 2015; Zaura et al. 2015; Raymond et al. 2016; Yassour et al. 2016; Palleja et al. 2018; Ng et al. 2019). Yet, the strain-level dynamics that give rise to this ecological robustness remain poorly characterized.

In this study, we sought to characterize these within-species dynamics by following the gut microbiome of a single individual through a period of health, disease, and the oral administration of doxycycline. We used linked read metagenomic sequencing to track the dynamics of single nucleotide variants within 36 different species and to compare these dynamics with the broader ecological shifts at the species level. In contrast to previous applications of linked-read sequencing, which have mostly focused on genome assembly (Bishara et al. 2018) or taxonomic assignment (Danko et al. 2019), we developed population genetic approaches to quantify the fine-scale genetic changes that occur in these populations over time.

Consistent with our expectations, we found that antibiotic perturbations can lead to widespread shifts in the genetic composition of individual species, with a higher overall rate than observed in healthy hosts (Ghalayini et al. 2018; Garud et al. 2019; Zhao et al. 2019). However, these within-species changes were rarely consistent with the traditional picture of extinction and subsequent recolonization after treatment. Instead, we found that the genetic responses varied widely across species, with some resident populations acquiring thousands of consensus sequence differences and others acquiring only a handful. These genetic changes were frequently observed in species without large changes in relative abundance in the sampled time points, and conversely, large abundance fluctuations were not always accompanied by widespread genetic changes. Furthermore, we found that some of the largest fluctuations at both the species and SNV levels occurred in the weeks after treatment had concluded. Together, these findings suggest that the response to doxycycline is not driven by discrete recolonization events but rather by the subtler processes of strain-level competition and evolution within the host.

At this population genetic level, our observations revealed qualitative departures from the simplest models of neutral evolution or the spread of antibiotic resistance phenotypes via classic selective sweeps. Instead, the observed genetic responses were much more dynamic: we often observed partial genome-wide sweeps containing multiple linked genetic variants, many of which were previously segregating at substantial frequencies (>1%) before the onset of treatment. While they rapidly increased in frequency over daily or weekly timescales, few of these variants ever fixed in their respective populations. Instead, we observed frequent reversion of sweeps at the single base pair level, consistent with temporally varying selection pressures and strong pleiotropic tradeoffs. However, these reversions rarely ended in extinction and more commonly stabilized close to their initial pretreatment frequency. These dynamics suggest that the sweeping haplotypes may have been stably maintained in their respective populations over time, for example, due to metabolic or spatial niches. This provides a potential explanation for the “oligo-colonization” structure observed in a variety of within-host microbial populations (Truong et al. 2017; Yassour et al. 2018; Garud et al. 2019). Our data show that similar dynamics can occur for mixtures of distantly related strains, as well as for haplotypes that likely evolved within the host. This suggests that ongoing ecological diversification could play an important role in shaping the genetic structure of resident populations, echoing a previous finding in Bacteroides fragilis (Zhao et al. 2019).

There are several important limitations to our study. First, because we have focused primarily on single nucleotide variants in well-behaved regions of reference genomes, we are likely missing many of the true targets of selection, particularly in the case of antibiotic resistance where mobile elements (Partridge and Tsafnat 2018; Zlitni et al. 2020) and other structural rearrangements (Jiang et al. 2019) are known to play an important role. This makes it difficult to know what fraction of genetic changes are a direct response to antibiotics, as opposed to indirect responses produced by fluctuations in the abundances of other species. It is even possible that nearly all of the mutations that we observed are simply passenger mutations that are hitchhiking alongside the true causative variants. The situation could potentially be improved by combining our read mapping approach with de novo genome assembly, similar to previous studies (Quince et al. 2017; Crits-Christoph et al. 2020; Yaffe and Relman 2020), particularly if linked reads are used during the assembly step (Bishara et al. 2018). However, given the high levels of genetic linkage we inferred, it would be difficult to pinpoint individual selection pressures even with an exhaustive list of mutations, because one can only observe the net effects of selection across entire haplotypes. Our current reference-based approach is effectively using this limitation to our advantage, by relying on the dynamics of linked passengers to provide information about the net selection pressures on their corresponding haplotypes.

Our analysis also revealed potential limitations of linked-read sequencing for studying strain-level dynamics in the gut microbiota. Although the longer effective read lengths allowed us to resolve genetic linkage in species or genomic regions with high levels of intra-host diversity, we also found that temporal dynamics frequently occurred among closely related haplotypes (≪1 SNV per 10 kb) that are difficult to capture with typical ∼10-kb fragments. This length scale also poses challenges for linking shared or accessory regions with their corresponding species “backbone,” because transferred DNA segments and other structural variants are expected to have similar length scales. This suggests that alternative approaches like chromatin conformation capture (Hi-C) (Kent et al. 2020) or linked-read sequencing of single cells (Lan et al. 2017) might offer advantages over traditional long-read sequencing for resolving the dynamics of longer range forms of genetic linkage.

In addition to these methodological constraints, a second key limitation is our focus on a single host microbiome. Although the concentrated resources allowed us to observe a variety of different responses across individual species in the same community, further work will be required to establish the prevalence of these different patterns across larger cohorts and among different classes of antibiotics. Our high-resolution time course provides a valuable set of templates that can inform future classification efforts in larger but lower resolution studies.

In summary, by tracking a host microbiome through periods of disease, antibiotic treatment, and recovery, we uncovered new evidence that the ecological resilience of microbial communities might extend all the way down to the genetic level. Understanding how this resilience arises from the complex interplay between host genetic, epigenetic, and lifestyle factors, as well its implications for broader evolution of the microbiome, remains an exciting avenue for future work.

Previous Section Next Section

Methods

Sample collection and sequencing

Stool samples were collected from a 62-yr-old male over a 5-mo period and stored on dry ice immediately after collection (Supplemental Table S1; Supplemental Materials, Section 1). Approximately ∼200 mg of stool were aliquoted from each sample for HMW DNA extraction. HMW DNA was extracted using QIAamp DNA Stool mini kit (Qiagen) and quantified using a Qubit assay (Thermo Fisher Scientific) (Supplemental Materials, Section 2; Supplemental Data S5, S6; Supplemental Table S2). Barcoded sequencing libraries were prepared using the 10x Genomics Chromium platform (10x Genomics) and were sequenced using an Illumina HiSeq 4000 machine (Illumina) with a read length of 2 × 151 bp and a depth of 30–500 million reads per sample. After sequencing, barcode strings were extracted from each read, and reads were subsequently trimmed to remove low-quality sequence in the first 20–25 bp (Supplemental Materials, Section 3). Reads were assigned to read clouds using a lightweight error correction algorithm, which merges barcode strings that are a single edit away from the inferred read cloud (Supplemental Materials, Section 4).

Estimating species abundances

Species abundances were estimated using the MIDAS software package (Nayfach et al. 2016), which maps short sequencing reads against a database of universal single-copy marker genes from ∼6000 bacterial species (Supplemental Materials, Section 5.1). The relative abundance of each species is estimated from the relative coverage of its associated marker genes. The reference distributions in Figure 2 were obtained by applying this same approach to a panel of ∼900 healthy human fecal metagenomes collated in our previous study (Garud et al. 2019). Absolute abundances were estimated for a subset of the study time points using microbial DNA quantification (total DNA per mass of sample) (Contijoch et al. 2019). The absolute abundance of each species was estimated by scaling its relative abundance by the total concentration of extracted DNA (Supplemental Fig. S2), which serves as a proxy for the total microbiome density in that sample (Reyes et al. 2013).

Empirical estimates of read cloud impurity

The read cloud impurity estimates in Figure 1E were obtained using a custom version of the MIDAS software package (Nayfach et al. 2016), which was extended to track the read cloud labels associated with each short sequencing read (Supplemental Materials, Section 5). Short sequencing reads were mapped against a curated database of gene families (or “pan-genome”) constructed from the subset of species that were detected above a minimum abundance threshold in at least one time point (Supplemental Materials, Sections 5.1, 5.2). For each read cloud μ, we calculated the total number of associated short reads (r_μ,s) that mapped to each species s in the pan-genome database. The effective number of species (S_μ) was approximated by the root mean squared estimator, $\text{[math]}$ (1) where θ(z) is the Heaviside step function (Supplemental Materials, Section 7.1). The curves in Figure 1E were obtained by binning read clouds by their total coverage and plotting the median value of S_μ for each bin.

Identifying SNVs within species

Single nucleotide variants were identified using a custom version of the pipeline developed in our previous study (Garud et al. 2019), which is based on the MIDAS software package (Nayfach et al. 2016). Short sequencing reads were mapped against a curated panel of reference genomes constructed from the subset of species that were detected above a minimum abundance threshold in at least one time point (Supplemental Materials, Sections 5.1, 5.3). After applying a sequence of coverage filters, single nucleotide variants were identified from the read pileups at all remaining protein coding sites (Supplemental Materials, Sections 5.2, 5.3). The reference allele for each SNV was chosen to coincide with the consensus allele observed within a larger human cohort that we analyzed in our previous study (Garud et al. 2019). This cohort was also used to estimate the prevalence of each allele within the broader human population, which we defined as the fraction of metagenomic samples in which the target allele was present in a majority of the reads (Supplemental Materials, Section 5.3.1). Following repolarization, SNVs were retained for downstream analysis if the alternate allele was present with at least 10% frequency in at least one time point. Note that this definition also includes sites that are fixed within the focal host but where the host-wide consensus is a minority allele within the larger human cohort; these sites are important for the private marker SNV analysis described below. The detected SNVs were used to estimate the overall levels of genetic diversity in Figure 2A, defined as the fraction of sites in the “core genome” (Supplemental Materials, Section 5.2) in which the alternate allele was present at an intermediate frequency (0.2 < f < 0.8) (Supplemental Materials, Section 5.3.1).

Allele frequency trajectories were estimated for each SNV based on the number of unique read clouds that supported each allele. We developed an extension of the MIDAS software package (Nayfach et al. 2016) to track the read cloud labels associated with a subset of the SNVs identified above, leveraging the additional degeneracy within a read cloud to automatically correct for some sequencing and PCR errors (Supplemental Materials, Section 5.4). After applying a second round of coverage filters, we estimated the allele frequency at site i and time point t using the plug-in estimator $\text{[math]}$ , where $\text{[math]}$ is the number of unique read clouds supporting the alternate allele, and $\text{[math]}$ is the total number of unique read clouds that mapped to that site (Supplemental Materials, Section 5.5).

Identifying SNV differences over time

We used the allele frequency trajectories above to identify subsets of SNVs that experienced large changes in frequency between pairs of sequenced time points (Supplemental Materials, Section 5.5.1). We refer to these as “SNV differences,” because they indicate a full or partial “sweep” of the allele through the resident population of interest. For each ordered pair of time points, we first identified all SNVs whose minor allele frequency transitioned from <20% frequency in the initial time point to >70% frequency in the latter time point. We then compared the observed number of SNVs that satisfied this criterion with the expected number under a null model of binomial sampling noise, which yields an associated P-value and false discovery rate (FDR) for the ensemble of putative SNV differences between that pair of time points. All SNVs in the ensemble were declared to be SNV differences if the P-values and false discovery rates exceeded a desired significance threshold (P < 0.05 and FDR < 0.1). Additional details about the detection algorithm are provided in Supplemental Materials, Section 5.5.1.

Linking SNV differences with their inferred species backbone

We verified that SNV differences were linked to their inferred species backbone by examining patterns of read cloud sharing with the other reference genomes in our curated panel above (Supplemental Fig. S9). For each identified SNV difference in Figure 2C, we created a list of the core genes in each species that shared read clouds with either of the two alleles, and we calculated the fraction of frequently shared genes that originated from the correct species (Supplemental Materials, Section 7.5). We considered there to be a positive confirmation if, for both alleles, more than two thirds of the frequently shared genes originated from the correct species. Conversely, we considered there to be a negative confirmation if either of the alleles had more than a third of their frequently shared genes originating from a different species.

Quantifying genetic linkage between pairs of SNVs

We quantified genetic linkage between pairs of SNVs based on the distribution of shared read clouds among their four combinations of alleles (or “haplotypes”). This task is complicated by the high levels of read cloud impurity in Figure 1E: because a typical read cloud contains several fragments, individual examples of read cloud sharing do not provide conclusive evidence for genetic linkage on their own. We therefore adopted a statistical approach, which aims to quantify elevated rates of read cloud sharing above those expected from read cloud impurities alone (Supplemental Materials, Sections 7.3, 7.4). We first compiled a list of SNV pairs within each species with at least 12 shared read clouds across all study time points. We then compared the observed number of shared read clouds for each pair with a null model of read cloud impurity, in which the probability that SNVs i and j share read cloud μ by chance is given by $\text{[math]}$ (2) where p_i and p_j are related to the relative abundances of the SNVs at that time point, and $\text{[math]}$ is a function that depends on the total number of reads that were sequenced from that read cloud (Supplemental Materials, Section 7.3). We estimated these parameters empirically using the observed rates of read cloud sharing between species, under the assumption that the vast majority of this sharing was caused by read cloud impurities (Supplemental Fig. S16). We then used this model to obtain an associated P-value for each pair of SNVs above, which quantifies how their observed levels of barcode sharing differ from those expected under read cloud impurities alone (Supplemental Materials, Section 7.4.1). The top panel in Figure 4C shows the total number of SNV pairs in each species whose Bonferroni-corrected P-values were <0.1. For each of these significantly linked SNV pairs, we then repeated this calculation at the allelic level to quantify read cloud sharing among the four possible combinations of alleles for each SNV pair (Supplemental Materials, Section 7.4.2). Deviations from the read cloud impurity null model were used to classify SNVs into the three LD categories in Figure 4. Further details on the statistical model and genetic linkage classifications are provided in Supplemental Materials, Sections 7.3 and 7.4.

Inferring clusters of linked SNVs from correlated allele frequency trajectories

The SNV clusters in Figure 3 were inferred using a heuristic approach that leverages correlations in the underlying allele frequency trajectories (Supplemental Materials, Section 8). In contrast to previous strain detection methods that aim to resolve the full set of haplotypes within a population (O'Brien et al. 2014; Luo et al. 2015; Quince et al. 2017; Smillie et al. 2018), we employed a lightweight approach that only identifies clusters of SNVs that are likely to be perfectly linked to each other, while also accounting for uncertainty in their relative polarization. For each pair of SNVs i and j, we defined a pair of dissimilarity values, $\text{[math]}$ (3) where $\text{[math]}$ denotes the two possible polarizations of SNV j. We performed UPGMA clustering on the dissimilarity matrix $\text{[math]}$ , and we used a maximum distance criterion with an empirically derived threshold (d^* ≈ 3.5) (Supplemental Fig. S18) to identify flat SNV clusters from the resulting UPGMA dendrogram. The relative values of $\text{[math]}$ and $\text{[math]}$ were used to determine the relative polarizations of the SNVs in each cluster, and an average frequency trajectory was estimated for the cluster by summing the ancestral and derived read counts for each of these repolarized SNVs. Further details of the clustering algorithm, as well as comparisons to other strain detection methods, are described in Supplemental Materials, Section 8, and Supplemental Figure S17.

Inferring natural selection and genetic drift from haplotype frequency trajectories

We sought to quantify the relative contributions of natural selection and genetic drift on the SNV differences in Figure 3 by comparing the observed trajectories to a population genetic model for the underlying haplotype frequencies, $\text{[math]}$ (4) where η(τ) is a Brownian noise term (Gardiner 1985), and s_e (τ) and N_e (τ) are effective parameters that represent the aggregate effects of natural selection and genetic drift, respectively (Supplemental Materials, Section 9). The observed frequencies $\text{[math]}$ are generated from this underlying trajectory via an additional sampling step, $\text{[math]}$ (5) which models the finite sampling noise that occurs during sequencing. Equations (4) and (5) constitute a standard hidden Markov model (HMM) for modeling genetic time series (Bollback et al. 2008; Feder et al. 2014; Steinrücken et al. 2014; Schraiber et al. 2016). However, this model differs from previous approaches in that we explicitly allow for time-varying selection coefficients, which are necessary to account for potential environmental heterogeneity (e.g., due to antibiotics) or linkage with other selected mutations.

In the absence of genetic drift [N_e (τ) · s_e (τ) · f(τ)(1 − f(τ)) ≫ 1], these time-varying selection pressures can be inferred from the plug-in estimator, $\text{[math]}$ (6) which by definition perfectly reproduces the observed frequency trajectory. We therefore asked whether this same trajectory could have plausibly emerged from genetic drift and/or sequencing noise alone.

We first developed a statistical test to determine whether the observed mutation trajectory was consistent with the simplest neutral null model, with a constant but unknown strength of genetic drift (Supplemental Materials, Section 9). Briefly, we first used dynamic programming to calculate the likelihood of the observed trajectory, Λ(N_e), across a grid of N_e values to infer the maximum likelihood estimator, $\text{[math]}$ . We then used the maximum likelihood $\text{[math]}$ as a test statistic and performed parametric bootstrapping using simulated trajectories to calculate an associated P-value for the observed trajectory under the null hypothesis. This test was performed for each of the inferred SNV clusters in Figure 3, using the average frequency trajectory inferred via the clustering algorithm above (Supplemental Table S3). We also performed a generalization of this test with a variable N_e (τ), in which we inferred separate N_e values for the segments of the trajectory that occurred before, during, and after antibiotics. This allowed us to simulate the effects of a simple bottleneck during treatment. Additional details and mathematical definitions are provided in Supplemental Materials, Section 9.

Quantifying retention of private marker SNVs over time

Following previous work (Nayfach et al. 2016; Asnicar et al. 2017; Costea et al. 2017; Korpela et al. 2018; Garud et al. 2019), we sought to distinguish instances of strain replacement and evolutionary modification by examining the retention of so-called “private marker SNVs” before, during, and after the sweep (Supplemental Fig. S14; Supplemental Materials, Section 5.5.2). We used a generalization of the approach we developed in our previous study (Garud et al. 2019) for analyzing private marker SNV sharing between pairs of sequenced time points. We considered a SNV to be “private” if the prevalence of its alternate or reference allele (as defined in Supplemental Materials, Section 5.3.3) was equal to zero. We then defined the set of “disrupted private marker SNVs” to be the subset of temporally variable SNVs whose reference alleles (as shown in Fig. 3) are private; the definition of a temporally variable SNV implies that these private alleles must lose their majority status at some point during the time course. Conversely, we defined the set of “preserved private marker SNVs” to be the subset of private SNVs in which the private allele remained at high frequency in a large fraction of the time points (specifically, >80% frequency in at least 80% of the time points, and no lower than 50% frequency in any one time point). These definitions were used to construct the pie charts in Figure 5. Additional details and motivation for this approach are provided in Supplemental Materials, Section 5.5.2.

Software availability

Source code for the sequencing pipeline, downstream analyses, and figure generation are available at GitHub (https://github.com/bgoodlab/highres_microbiome_timecourse) and as Supplemental Code S1.

Previous Section Next Section

Data access

The raw sequencing data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA680579.

Previous Section Next Section

Competing interest statement

The authors declare no competing financial interests. K.S.P. is a consultant for Phylagen.

Previous Section Next Section

Acknowledgments

We thank Eitan Yaffe for comments on the manuscript. This work was funded in part by the U.S. National Institutes of Health grants U54DK10255603, R01AT01023202, and 2RM1HG00773506. Sequencing was performed at the Stanford Center for Genomics and Personalized Medicine supported by U.S. National Institutes of Health grant S10OD020141. N.R.G. and K.S.P. acknowledge support from the U.S. National Science Foundation (DMS-1563159), the Chan Zuckerberg Biohub, and the Gladstone Institutes. B.H.G. acknowledges support from the Miller Institute for Basic Research in Science and a Stanford Terman Fellowship.

Author contributions: M.R. and M.P.S. conceived the study; B.H.G. designed the analysis; M.R. and M.A. developed the HMW DNA extraction protocol and performed the experiments; S.M.L., H.L., A.B., M.R., S.N., and W.Z. performed sequencing QC and preliminary bioinformatic analyses; B.H.G., N.R.G., and S.M. developed the metagenomic pipeline and analyzed SNV data; B.H.G., S.M., and K.S.P. developed theory and statistical methods; K.S.P. and M.P.S. supervised the study; M.R. and B.H.G. wrote the paper; M.R., B.H.G., N.R.G., K.S.P., and M.P.S. edited the paper. Correspondence and requests for materials should be addressed to B.H.G. (bhgood{at}stanford.edu), K.S.P. (katherine.pollard{at}gladstone.ucsf.edu), or M.P.S. (mpsnyder{at}stanford.edu).

Previous Section Next Section

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.265058.120.
Freely available online through the Genome Research Open Access option.

Received April 24, 2020.
Accepted June 25, 2021.

© 2021 Roodgar et al.; Published by Cold Spring Harbor Laboratory Press

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

Previous Section

References

↵

Agwuh KN, MacGowan A. 2006. Pharmacokinetics and pharmacodynamics of the tetracyclines including glycylcyclines. J Antimicrob Chemother 58: 256–265. doi:10.1093/jac/dkl224

CrossRef Medline Google Scholar
↵

Asnicar F, Manara S, Zolfo M, Truong DT, Scholz M, Armanini F, Ferretti P, Gorfer V, Pedrotti A, Tett A, et al. 2017. Studying vertical microbiome transmission from mothers to infants by strain-level metagenomic profiling. mSystems 2: e00164-16. doi:10.1128/mSystems.00164-16

Abstract/FREE Full Text
↵

Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A, Dekas AE, Batzoglou S, Bhatt AS. 2018. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol 36: 1067–1075. doi:10.1038/nbt.4266

CrossRef Google Scholar
↵

Bollback JP, York TL, Nielsen R. 2008. Estimation of 2N_es from temporal allele frequency data. Genetics 179: 497–502. doi:10.1534/genetics.107.085019

Abstract/FREE Full Text
↵

Buffie CG, Bucci V, Stein RR, McKenney PT, Ling L, Gobourne A, No D, Liu H, Kinnebrew M, Viale A, et al. 2015. Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile. Nature 517: 205–208. doi:10.1038/nature13828

CrossRef Medline Google Scholar
↵

Cameron EA, Sperandio V. 2015. Frenemies: signaling and nutritional integration in pathogen-microbiota-host interactions. Cell Host Microbe 18: 275–284. doi:10.1016/j.chom.2015.08.007

CrossRef Medline Google Scholar
↵

Contijoch EJ, Britton GJ, Yang C, Mogno I, Li Z, Ng R, Llewellyn SR, Hira S, Johnson C, Rabinowitz KM, et al. 2019. Gut microbiota density influences host physiology and is shaped by host and microbial factors. eLife 8: e40553. doi:10.7554/eLife.40553

CrossRef Google Scholar
↵

Costea PI, Munch R, Coelho LP, Paoli L, Sunagawa S, Bork P. 2017. metaSNV: a tool for metagenomic strain level analysis. PLoS One 12: e0182392. doi:10.1371/journal.pone.0182392

CrossRef Google Scholar
↵

Crits-Christoph A, Olm MR, Diamond S, Bouma-Gregson K, Banfield JF. 2020. Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. ISME J 14: 1834–1846. doi:10.1038/s41396-020-0655-x

CrossRef Google Scholar
↵

Cvijović I, Ba ANN, Desai MM. 2018. Experimental studies of evolutionary dynamics in microbes. Trends Genet 34: 693–703. doi:10.1016/j.tig.2018.06.004

CrossRef Google Scholar
↵

Danko DC, Meleshko D, Bezdan D, Mason C, Hajirasouliha I. 2019. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. Genome Res 29: 116–124. doi:10.1101/gr.235499.118

Abstract/FREE Full Text
↵

Dethlefsen L, Relman DA. 2011. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc Natl Acad Sci 108: 4554–4561. doi:10.1073/pnas.1000087107

Abstract/FREE Full Text
↵

Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. 2016. Within-host evolution of bacterial pathogens. Nat Rev Microbiol 14: 150–162. doi:10.1038/nrmicro.2015.13

CrossRef Medline Google Scholar
↵

Faith JJ, Guruge JL, Charbonneau M, Subramanian S, Seedorf H, Goodman AL, Clemente JC, Knight R, Heath AC, Leibel RL, et al. 2013. The long-term stability of the human gut microbiota. Science 341: 1237439. doi:10.1126/science.1237439

Abstract/FREE Full Text
↵

Feder AF, Kryazhimskiy S, Plotkin JB. 2014. Identifying signatures of selection in genetic time series. Genetics 196: 509–522. doi:10.1534/genetics.113.158220

Abstract/FREE Full Text
↵

Feng Q, Liang S, Jia H, Stadlmayr A, Tang L, Lan Z, Zhang D, Xia H, Xu X, Jie Z, et al. 2015. Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat Commun 6: 6528. doi:10.1038/ncomms7528

CrossRef Medline Google Scholar
↵

Gardiner CW. 1985. Handbook of stochastic methods. Springer, Berlin.

Google Scholar
↵

Garud NR, Good BH, Hallatschek O, Pollard KS. 2019. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol 17: e3000102. doi:10.1371/journal.pbio.3000102

CrossRef Google Scholar
↵

Ghalayini M, Launay A, Bridier-Nahmias A, Clermont O, Denamur E, Lescat M, Tenaillon O. 2018. Evolution of a dominant natural isolate of Escherichia coli in the human gut over the course of a year suggests a neutral evolution with reduced effective population size. Appl Environ Microbiol 84: e02377-17. doi:10.1128/AEM.02377-17

Abstract/FREE Full Text
↵

Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE. 2006. Metagenomic analysis of the human distal gut microbiome. Science 312: 1355–1359. doi:10.1126/science.1124234

Abstract/FREE Full Text
↵

Halfvarson J, Brislawn CJ, Lamendella R, Vázquez-Baeza Y, Walters WA, Bramer LM, D'Amato M, Bonfiglio F, McDonald D, Gonzalez A, et al. 2017. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2: 17004. doi:10.1038/nmicrobiol.2017.4

CrossRef Google Scholar
↵

Hu Y, Yang X, Qin J, Lu N, Cheng G, Wu N, Pan Y, Li J, Zhu L, Wang X, et al. 2013. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat Commun 4: 2151. doi:10.1038/ncomms3151

CrossRef Medline Google Scholar
↵

Hubbell SP. 2001. The unified neutral theory of biodiversity and biogeography (MPB-32). Princeton University Press, Princeton, NJ.

Google Scholar
↵

Hudson RR, Kaplan NL. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147–164. doi:10.1093/genetics/111.1.147

Abstract/FREE Full Text
↵

Jernberg C, Löfmark S, Edlund C, Jansson JK. 2010. Long-term impacts of antibiotic exposure on the human intestinal microbiota. Microbiology 156: 3216–3223. doi:10.1099/mic.0.040618-0

CrossRef Medline Google Scholar
↵

Jiang X, Hall AB, Arthur TD, Plichta DR, Covington CT, Poyet M, Crothers J, Moses PL, Tolonen AC, Vlamakis H, et al. 2019. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 363: 181–187. doi:10.1126/science.aau5238

Abstract/FREE Full Text
↵

Kamada N, Kim Y-G, Sham HP, Vallance BA, Puente JL, Martens EC, Núñez G. 2012. Regulated virulence controls the ability of a pathogen to compete with the gut microbiota. Science 336: 1325–1329. doi:10.1126/science.1222195

Abstract/FREE Full Text
↵

Keeney KM, Yurist-Doutsch S, Arrieta M-C, Finlay BB. 2014. Effects of antibiotics on human microbiota and subsequent disease. Annu Rev Microbiol 68: 217–235. doi:10.1146/annurev-micro-091313-103456

CrossRef Medline Google Scholar
↵

Kent AG, Vill AC, Shi Q, Satlin MJ, Brito IL. 2020. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat Commun 11: 4379. doi:10.1038/s41467-020-18164-7

CrossRef Google Scholar
↵

Kimura M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

Google Scholar
↵

Kintses B, Méhi O, Ari E, Számel M, Györkei Á, Jangir PK, Nagy I, Pál F, Fekete G, Tengölics R, et al. 2019. Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota. Nat Microbiol 4: 447–458. doi:10.1038/s41564-018-0313-5

CrossRef Google Scholar
↵

Korpela K, Costea P, Coelho LP, Kandels-Lewis S, Willemsen G, Boomsma DI, Segata N, Bork P. 2018. Selective maternal seeding and environment shape the human gut microbiome. Genome Res 28: 561–568. doi:10.1101/gr.233940.117

Abstract/FREE Full Text
↵

Kuleshov V, Snyder MP, Batzoglou S. 2016. Genome assembly from synthetic long read clouds. Bioinformatics 32: i216–i224. doi:10.1093/bioinformatics/btw267

CrossRef Medline Google Scholar
↵

Lan F, Demaree B, Ahmed N, Abate AR. 2017. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding. Nat Biotechnol 35: 640–646. doi:10.1038/nbt.3880

CrossRef Google Scholar
↵

Lauring AS. 2020. Within-host viral diversity: a window into viral evolution. Annu Rev Virol 7: 63–81. doi:10.1146/annurev-virology-010320-061642

CrossRef Google Scholar
↵

Lewis JD, Chen EZ, Baldassano RN, Otley AR, Griffiths AM, Lee D, Bittinger K, Bailey A, Friedman ES, Hoffmann C, et al. 2015. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn's disease. Cell Host Microbe 18: 489–500. doi:10.1016/j.chom.2015.09.008

CrossRef Medline Google Scholar
↵

Lloyd-Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall AB, Brady A, Creasy HH, McCracken C, Giglio MG, et al. 2017. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550: 61–66. doi:10.1038/nature23889

CrossRef Medline Google Scholar
↵

Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D. 2015. Constrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33: 1045–1052. doi:10.1038/nbt.3319

CrossRef Medline Google Scholar
↵

Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. 2016. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res 26: 1612–1625. doi:10.1101/gr.201863.115

Abstract/FREE Full Text
↵

Neher RA. 2013. Genetic draft, selective interference, and population genetics of rapid adaptation. Annu Rev Ecol Evol Syst 44: 195–215. doi:10.1146/annurev-ecolsys-110512-135920

CrossRef Google Scholar
↵

Ng KM, Aranda-Díaz A, Tropini C, Frankel MR, Van Treuren W, O'Laughlin CT, Merrill BD, Yu FB, Pruss KM, Oliveira RA, et al. 2019. Recovery of the gut microbiota after antibiotics depends on host diet, community context, and environmental reservoirs. Cell Host Microbe 26: 650–665.e4. doi:10.1016/j.chom.2019.10.011

CrossRef Google Scholar
↵

O'Brien JD, Didelot X, Iqbal Z, Amenga-Etego L, Ahiska B, Falush D. 2014. A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data. Genetics 197: 925–937. doi:10.1534/genetics.114.161299

Abstract/FREE Full Text
↵

Paaby AB, Rockman MV. 2014. Cryptic genetic variation: evolution's hidden substrate. Nat Rev Genet 15: 247–258. doi:10.1038/nrg3688

CrossRef Medline Google Scholar
↵

Palleja A, Mikkelsen KH, Forslund SK, Kashani A, Allin KH, Nielsen T, Hansen TH, Liang SS, Feng Q, Zhang CC, et al. 2018. Recovery of gut microbiota of healthy adults following antibiotic exposure. Nat Microbiol 3: 1255–1265. doi:10.1038/s41564-018-0257-9

CrossRef Google Scholar
↵

Partridge SR, Tsafnat G. 2018. Automated annotation of mobile antibiotic resistance in Gram-negative bacteria: the Multiple Antibiotic Resistance Annotator (MARA) and database. J Antimicrob Chemother 73: 883–890. doi:10.1093/jac/dkx513

CrossRef Google Scholar
↵

Quince C, Delmont TO, Raguideau S, Alneberg J, Darling AE, Collins G, Eren AM. 2017. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol 18: 181. doi:10.1186/s13059-017-1309-9

CrossRef Google Scholar
↵

Rasmussen B, Bush K, Tally F. 1993. Antimicrobial resistance in Bacteroides. Clin Infect Dis 16: S390–S400. doi:10.1093/clinids/16.Supplement_4.S390

CrossRef Medline Google Scholar
↵

Raymond F, Ouameur AA, Déraspe M, Iqbal N, Gingras H, Dridi B, Leprohon P, Plante PL, Giroux R, Bérubé E, et al. 2016. The initial state of the human gut microbiome determines its reshaping by antibiotics. ISME J 10: 707–720. doi:10.1038/ismej.2015.148

CrossRef Medline Google Scholar
↵

Relman DA, Lipsitch M. 2018. Microbiome as a tool and a target in the effort to address antimicrobial resistance. Proc Natl Acad Sci 115: 12902–12910. doi:10.1073/pnas.1717163115

Abstract/FREE Full Text
↵

Reyes A, Wu M, McNulty NP, Rohwer FL, Gordon JI. 2013. Gnotobiotic mouse model of phage–bacterial host dynamics in the human gut. Proc Natl Acad Sci 110: 20236–20241. doi:10.1073/pnas.1319470110

Abstract/FREE Full Text
↵

Sassone-Corsi M, Raffatellu M. 2015. No vacancy: how beneficial microbes cooperate with immunity to provide colonization resistance to pathogens. J Immunol 194: 4081–4087. doi:10.4049/jimmunol.1403169

Abstract/FREE Full Text
↵

Schloissnig S, Arumugam M, Sunagawa S, Mitreva M, Tap J, Zhu A, Waller A, Mende DR, Kultima JR, Martin J, et al. 2013. Genomic variation landscape of the human gut microbiome. Nature 493: 45–50. doi:10.1038/nature11711

CrossRef Medline Google Scholar
↵

Scholz M, Ward DV, Pasolli E, Tolio T, Zolfo M, Asnicar F, Truong DT, Tett A, Morrow AL, Segata N. 2016. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods 13: 435–438. doi:10.1038/nmeth.3802

CrossRef Medline Google Scholar
↵

Schraiber JG, Evans SN, Slatkin M. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics 203: 493–511. doi:10.1534/genetics.116.187278

Abstract/FREE Full Text
↵

Sharon G, Sampson TR, Geschwind DH, Mazmanian SK. 2016. The central nervous system and the gut microbiome. Cell 167: 915–932. doi:10.1016/j.cell.2016.10.027

CrossRef Medline Google Scholar
↵

Smillie CS, Sauk J, Gevers D, Friedman J, Sung J, Youngster I, Hohmann EL, Staley C, Khoruts A, Sadowsky MJ, et al. 2018. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23: 229–240.e5. doi:10.1016/j.chom.2018.01.003

CrossRef Medline Google Scholar
↵

Spanogiannopoulos P, Bess EN, Carmody RN, Turnbaugh PJ. 2016. The microbial pharmacists within us: a metagenomic view of xenobiotic metabolism. Nat Rev Microbiol 14: 273–287. doi:10.1038/nrmicro.2016.17

CrossRef Medline Google Scholar
↵

Steinrücken M, Bhaskar A, Song YS. 2014. A novel spectral method for inferring general diploid selection from time series genetic data. Ann Appl Stat 8: 2203. doi:10.1214/14-AOAS764

CrossRef Medline Google Scholar
↵

Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. 2017. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 27: 626–638. doi:10.1101/gr.216242.116

Abstract/FREE Full Text
↵

Vos M, Didelot X. 2009. A comparison of homologous recombination rates in bacteria and archaea. ISME J 3: 199–208. doi:10.1038/ismej.2008.93

CrossRef Medline Google Scholar
↵

Ward DV, Scholz M, Zolfo M, Taft DH, Schibler KR, Tett A, Segata N, Morrow AL. 2016. Metagenomic sequencing with strain-level resolution implicates uropathogenic E. coli in necrotizing enterocolitis and mortality in preterm infants. Cell Rep 14: 2912–2924. doi:10.1016/j.celrep.2016.03.015

CrossRef Medline Google Scholar
↵

Yaffe E, Relman DA. 2020. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol 5: 343–353. doi:10.1038/s41564-019-0625-0

CrossRef Google Scholar
↵

Yassour M, Vatanen T, Siljander H, Hämäläinen AM, Härkönen T, Ryhänen SJ, Franzosa EA, Vlamakis H, Huttenhower C, Gevers D, et al. 2016. Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability. Sci Transl Med 8: 343ra81. doi:10.1126/scitranslmed.aad0917

Abstract/FREE Full Text
↵

Yassour M, Jason E, Hogstrom LJ, Arthur TD, Tripathi S, Siljander H, Selvenius J, Oikarinen S, Hyöty H, Virtanen SM, et al. 2018. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24: 146–154.e4. doi:10.1016/j.chom.2018.06.007

CrossRef Google Scholar
↵

Yin J, Zhang X-X, Wu B, Xian Q. 2015. Metagenomic insights into tetracycline effects on microbial community and antibiotic resistance of mouse gut. Ecotoxicology 24: 2125–2132. doi:10.1007/s10646-015-1540-7

CrossRef Google Scholar
↵

Zaura E, Brandt BW, de Mattos MJ T, Buijs MJ, Caspers MP, Rashid MU, Weintraub A, Nord CE, Savell A, Hu Y, et al. 2015. Same exposure but two radically different responses to antibiotics: resilience of the salivary microbiome versus long-term microbial shifts in feces. mBio 6: e01693-15. doi:10.1128/mBio.01693-15

Abstract/FREE Full Text
↵

Zhao S, Lieberman TD, Poyet M, Kauffman KM, Gibbons SM, Groussin M, Xavier RJ, Alm EJ. 2019. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25: 656–667.e8. doi:10.1016/j.chom.2019.03.007

CrossRef Google Scholar
↵

Zlitni S, Bishara A, Moss EL, Tkachenko E, Kang JB, Culver RN, Andermann TM, Weng Z, Wood C, Handy C, et al. 2020. Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale. Genome Med 12: 50. doi:10.1186/s13073-020-00747-0

CrossRef Google Scholar

[1] ↵

Agwuh KN, MacGowan A. 2006. Pharmacokinetics and pharmacodynamics of the tetracyclines including glycylcyclines. J Antimicrob Chemother 58: 256–265. doi:10.1093/jac/dkl224

CrossRef Medline Google Scholar

[2] ↵

Asnicar F, Manara S, Zolfo M, Truong DT, Scholz M, Armanini F, Ferretti P, Gorfer V, Pedrotti A, Tett A, et al. 2017. Studying vertical microbiome transmission from mothers to infants by strain-level metagenomic profiling. mSystems 2: e00164-16. doi:10.1128/mSystems.00164-16

Abstract/FREE Full Text

[3] ↵

Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A, Dekas AE, Batzoglou S, Bhatt AS. 2018. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol 36: 1067–1075. doi:10.1038/nbt.4266

CrossRef Google Scholar

[4] ↵

Bollback JP, York TL, Nielsen R. 2008. Estimation of 2N_es from temporal allele frequency data. Genetics 179: 497–502. doi:10.1534/genetics.107.085019

Abstract/FREE Full Text

[5] ↵

Buffie CG, Bucci V, Stein RR, McKenney PT, Ling L, Gobourne A, No D, Liu H, Kinnebrew M, Viale A, et al. 2015. Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile. Nature 517: 205–208. doi:10.1038/nature13828

CrossRef Medline Google Scholar

[6] ↵

Cameron EA, Sperandio V. 2015. Frenemies: signaling and nutritional integration in pathogen-microbiota-host interactions. Cell Host Microbe 18: 275–284. doi:10.1016/j.chom.2015.08.007

CrossRef Medline Google Scholar

[7] ↵

Contijoch EJ, Britton GJ, Yang C, Mogno I, Li Z, Ng R, Llewellyn SR, Hira S, Johnson C, Rabinowitz KM, et al. 2019. Gut microbiota density influences host physiology and is shaped by host and microbial factors. eLife 8: e40553. doi:10.7554/eLife.40553

CrossRef Google Scholar

[8] ↵

Costea PI, Munch R, Coelho LP, Paoli L, Sunagawa S, Bork P. 2017. metaSNV: a tool for metagenomic strain level analysis. PLoS One 12: e0182392. doi:10.1371/journal.pone.0182392

CrossRef Google Scholar

[9] ↵

Crits-Christoph A, Olm MR, Diamond S, Bouma-Gregson K, Banfield JF. 2020. Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. ISME J 14: 1834–1846. doi:10.1038/s41396-020-0655-x

CrossRef Google Scholar

[10] ↵

Cvijović I, Ba ANN, Desai MM. 2018. Experimental studies of evolutionary dynamics in microbes. Trends Genet 34: 693–703. doi:10.1016/j.tig.2018.06.004

CrossRef Google Scholar

[11] ↵

Danko DC, Meleshko D, Bezdan D, Mason C, Hajirasouliha I. 2019. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. Genome Res 29: 116–124. doi:10.1101/gr.235499.118

Abstract/FREE Full Text

[12] ↵

Dethlefsen L, Relman DA. 2011. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc Natl Acad Sci 108: 4554–4561. doi:10.1073/pnas.1000087107

Abstract/FREE Full Text

[13] ↵

Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. 2016. Within-host evolution of bacterial pathogens. Nat Rev Microbiol 14: 150–162. doi:10.1038/nrmicro.2015.13

CrossRef Medline Google Scholar

[14] ↵

Faith JJ, Guruge JL, Charbonneau M, Subramanian S, Seedorf H, Goodman AL, Clemente JC, Knight R, Heath AC, Leibel RL, et al. 2013. The long-term stability of the human gut microbiota. Science 341: 1237439. doi:10.1126/science.1237439

Abstract/FREE Full Text

[15] ↵

Feder AF, Kryazhimskiy S, Plotkin JB. 2014. Identifying signatures of selection in genetic time series. Genetics 196: 509–522. doi:10.1534/genetics.113.158220

Abstract/FREE Full Text

[16] ↵

Feng Q, Liang S, Jia H, Stadlmayr A, Tang L, Lan Z, Zhang D, Xia H, Xu X, Jie Z, et al. 2015. Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat Commun 6: 6528. doi:10.1038/ncomms7528

CrossRef Medline Google Scholar

[17] ↵

Gardiner CW. 1985. Handbook of stochastic methods. Springer, Berlin.

Google Scholar

[18] ↵

Garud NR, Good BH, Hallatschek O, Pollard KS. 2019. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol 17: e3000102. doi:10.1371/journal.pbio.3000102

CrossRef Google Scholar

[19] ↵

Ghalayini M, Launay A, Bridier-Nahmias A, Clermont O, Denamur E, Lescat M, Tenaillon O. 2018. Evolution of a dominant natural isolate of Escherichia coli in the human gut over the course of a year suggests a neutral evolution with reduced effective population size. Appl Environ Microbiol 84: e02377-17. doi:10.1128/AEM.02377-17

Abstract/FREE Full Text

[20] ↵

Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE. 2006. Metagenomic analysis of the human distal gut microbiome. Science 312: 1355–1359. doi:10.1126/science.1124234

Abstract/FREE Full Text

[21] ↵

Halfvarson J, Brislawn CJ, Lamendella R, Vázquez-Baeza Y, Walters WA, Bramer LM, D'Amato M, Bonfiglio F, McDonald D, Gonzalez A, et al. 2017. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2: 17004. doi:10.1038/nmicrobiol.2017.4

CrossRef Google Scholar

[22] ↵

Hu Y, Yang X, Qin J, Lu N, Cheng G, Wu N, Pan Y, Li J, Zhu L, Wang X, et al. 2013. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat Commun 4: 2151. doi:10.1038/ncomms3151

CrossRef Medline Google Scholar

[23] ↵

Hubbell SP. 2001. The unified neutral theory of biodiversity and biogeography (MPB-32). Princeton University Press, Princeton, NJ.

Google Scholar

[24] ↵

Hudson RR, Kaplan NL. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147–164. doi:10.1093/genetics/111.1.147

Abstract/FREE Full Text

[25] ↵

Jernberg C, Löfmark S, Edlund C, Jansson JK. 2010. Long-term impacts of antibiotic exposure on the human intestinal microbiota. Microbiology 156: 3216–3223. doi:10.1099/mic.0.040618-0

CrossRef Medline Google Scholar

[26] ↵

Jiang X, Hall AB, Arthur TD, Plichta DR, Covington CT, Poyet M, Crothers J, Moses PL, Tolonen AC, Vlamakis H, et al. 2019. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 363: 181–187. doi:10.1126/science.aau5238

Abstract/FREE Full Text

[27] ↵

Kamada N, Kim Y-G, Sham HP, Vallance BA, Puente JL, Martens EC, Núñez G. 2012. Regulated virulence controls the ability of a pathogen to compete with the gut microbiota. Science 336: 1325–1329. doi:10.1126/science.1222195

Abstract/FREE Full Text

[28] ↵

Keeney KM, Yurist-Doutsch S, Arrieta M-C, Finlay BB. 2014. Effects of antibiotics on human microbiota and subsequent disease. Annu Rev Microbiol 68: 217–235. doi:10.1146/annurev-micro-091313-103456

CrossRef Medline Google Scholar

[29] ↵

Kent AG, Vill AC, Shi Q, Satlin MJ, Brito IL. 2020. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat Commun 11: 4379. doi:10.1038/s41467-020-18164-7

CrossRef Google Scholar

[30] ↵

Kimura M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

Google Scholar

[31] ↵

Kintses B, Méhi O, Ari E, Számel M, Györkei Á, Jangir PK, Nagy I, Pál F, Fekete G, Tengölics R, et al. 2019. Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota. Nat Microbiol 4: 447–458. doi:10.1038/s41564-018-0313-5

CrossRef Google Scholar

[32] ↵

Korpela K, Costea P, Coelho LP, Kandels-Lewis S, Willemsen G, Boomsma DI, Segata N, Bork P. 2018. Selective maternal seeding and environment shape the human gut microbiome. Genome Res 28: 561–568. doi:10.1101/gr.233940.117

Abstract/FREE Full Text

[33] ↵

Kuleshov V, Snyder MP, Batzoglou S. 2016. Genome assembly from synthetic long read clouds. Bioinformatics 32: i216–i224. doi:10.1093/bioinformatics/btw267

CrossRef Medline Google Scholar

[34] ↵

Lan F, Demaree B, Ahmed N, Abate AR. 2017. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding. Nat Biotechnol 35: 640–646. doi:10.1038/nbt.3880

CrossRef Google Scholar

[35] ↵

Lauring AS. 2020. Within-host viral diversity: a window into viral evolution. Annu Rev Virol 7: 63–81. doi:10.1146/annurev-virology-010320-061642

CrossRef Google Scholar

[36] ↵

Lewis JD, Chen EZ, Baldassano RN, Otley AR, Griffiths AM, Lee D, Bittinger K, Bailey A, Friedman ES, Hoffmann C, et al. 2015. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn's disease. Cell Host Microbe 18: 489–500. doi:10.1016/j.chom.2015.09.008

CrossRef Medline Google Scholar

[37] ↵

Lloyd-Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall AB, Brady A, Creasy HH, McCracken C, Giglio MG, et al. 2017. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550: 61–66. doi:10.1038/nature23889

CrossRef Medline Google Scholar

[38] ↵

Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D. 2015. Constrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33: 1045–1052. doi:10.1038/nbt.3319

CrossRef Medline Google Scholar

[39] ↵

Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. 2016. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res 26: 1612–1625. doi:10.1101/gr.201863.115

Abstract/FREE Full Text

[40] ↵

Neher RA. 2013. Genetic draft, selective interference, and population genetics of rapid adaptation. Annu Rev Ecol Evol Syst 44: 195–215. doi:10.1146/annurev-ecolsys-110512-135920

CrossRef Google Scholar

[41] ↵

Ng KM, Aranda-Díaz A, Tropini C, Frankel MR, Van Treuren W, O'Laughlin CT, Merrill BD, Yu FB, Pruss KM, Oliveira RA, et al. 2019. Recovery of the gut microbiota after antibiotics depends on host diet, community context, and environmental reservoirs. Cell Host Microbe 26: 650–665.e4. doi:10.1016/j.chom.2019.10.011

CrossRef Google Scholar

[42] ↵

O'Brien JD, Didelot X, Iqbal Z, Amenga-Etego L, Ahiska B, Falush D. 2014. A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data. Genetics 197: 925–937. doi:10.1534/genetics.114.161299

Abstract/FREE Full Text

[43] ↵

Paaby AB, Rockman MV. 2014. Cryptic genetic variation: evolution's hidden substrate. Nat Rev Genet 15: 247–258. doi:10.1038/nrg3688

CrossRef Medline Google Scholar

[44] ↵

Palleja A, Mikkelsen KH, Forslund SK, Kashani A, Allin KH, Nielsen T, Hansen TH, Liang SS, Feng Q, Zhang CC, et al. 2018. Recovery of gut microbiota of healthy adults following antibiotic exposure. Nat Microbiol 3: 1255–1265. doi:10.1038/s41564-018-0257-9

CrossRef Google Scholar

[45] ↵

Partridge SR, Tsafnat G. 2018. Automated annotation of mobile antibiotic resistance in Gram-negative bacteria: the Multiple Antibiotic Resistance Annotator (MARA) and database. J Antimicrob Chemother 73: 883–890. doi:10.1093/jac/dkx513

CrossRef Google Scholar

[46] ↵

Quince C, Delmont TO, Raguideau S, Alneberg J, Darling AE, Collins G, Eren AM. 2017. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol 18: 181. doi:10.1186/s13059-017-1309-9

CrossRef Google Scholar

[47] ↵

Rasmussen B, Bush K, Tally F. 1993. Antimicrobial resistance in Bacteroides. Clin Infect Dis 16: S390–S400. doi:10.1093/clinids/16.Supplement_4.S390

CrossRef Medline Google Scholar

[48] ↵

Raymond F, Ouameur AA, Déraspe M, Iqbal N, Gingras H, Dridi B, Leprohon P, Plante PL, Giroux R, Bérubé E, et al. 2016. The initial state of the human gut microbiome determines its reshaping by antibiotics. ISME J 10: 707–720. doi:10.1038/ismej.2015.148

CrossRef Medline Google Scholar

[49] ↵

Relman DA, Lipsitch M. 2018. Microbiome as a tool and a target in the effort to address antimicrobial resistance. Proc Natl Acad Sci 115: 12902–12910. doi:10.1073/pnas.1717163115

Abstract/FREE Full Text

[50] ↵

Reyes A, Wu M, McNulty NP, Rohwer FL, Gordon JI. 2013. Gnotobiotic mouse model of phage–bacterial host dynamics in the human gut. Proc Natl Acad Sci 110: 20236–20241. doi:10.1073/pnas.1319470110

Abstract/FREE Full Text

[51] ↵

Sassone-Corsi M, Raffatellu M. 2015. No vacancy: how beneficial microbes cooperate with immunity to provide colonization resistance to pathogens. J Immunol 194: 4081–4087. doi:10.4049/jimmunol.1403169

Abstract/FREE Full Text

[52] ↵

Schloissnig S, Arumugam M, Sunagawa S, Mitreva M, Tap J, Zhu A, Waller A, Mende DR, Kultima JR, Martin J, et al. 2013. Genomic variation landscape of the human gut microbiome. Nature 493: 45–50. doi:10.1038/nature11711

CrossRef Medline Google Scholar

[53] ↵

Scholz M, Ward DV, Pasolli E, Tolio T, Zolfo M, Asnicar F, Truong DT, Tett A, Morrow AL, Segata N. 2016. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods 13: 435–438. doi:10.1038/nmeth.3802

CrossRef Medline Google Scholar

[54] ↵

Schraiber JG, Evans SN, Slatkin M. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics 203: 493–511. doi:10.1534/genetics.116.187278

Abstract/FREE Full Text

[55] ↵

Sharon G, Sampson TR, Geschwind DH, Mazmanian SK. 2016. The central nervous system and the gut microbiome. Cell 167: 915–932. doi:10.1016/j.cell.2016.10.027

CrossRef Medline Google Scholar

[56] ↵

Smillie CS, Sauk J, Gevers D, Friedman J, Sung J, Youngster I, Hohmann EL, Staley C, Khoruts A, Sadowsky MJ, et al. 2018. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23: 229–240.e5. doi:10.1016/j.chom.2018.01.003

CrossRef Medline Google Scholar

[57] ↵

Spanogiannopoulos P, Bess EN, Carmody RN, Turnbaugh PJ. 2016. The microbial pharmacists within us: a metagenomic view of xenobiotic metabolism. Nat Rev Microbiol 14: 273–287. doi:10.1038/nrmicro.2016.17

CrossRef Medline Google Scholar

[58] ↵

Steinrücken M, Bhaskar A, Song YS. 2014. A novel spectral method for inferring general diploid selection from time series genetic data. Ann Appl Stat 8: 2203. doi:10.1214/14-AOAS764

CrossRef Medline Google Scholar

[59] ↵

Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. 2017. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 27: 626–638. doi:10.1101/gr.216242.116

Abstract/FREE Full Text

[60] ↵

Vos M, Didelot X. 2009. A comparison of homologous recombination rates in bacteria and archaea. ISME J 3: 199–208. doi:10.1038/ismej.2008.93

CrossRef Medline Google Scholar

[61] ↵

Ward DV, Scholz M, Zolfo M, Taft DH, Schibler KR, Tett A, Segata N, Morrow AL. 2016. Metagenomic sequencing with strain-level resolution implicates uropathogenic E. coli in necrotizing enterocolitis and mortality in preterm infants. Cell Rep 14: 2912–2924. doi:10.1016/j.celrep.2016.03.015

CrossRef Medline Google Scholar

[62] ↵

Yaffe E, Relman DA. 2020. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol 5: 343–353. doi:10.1038/s41564-019-0625-0

CrossRef Google Scholar

[63] ↵

Yassour M, Vatanen T, Siljander H, Hämäläinen AM, Härkönen T, Ryhänen SJ, Franzosa EA, Vlamakis H, Huttenhower C, Gevers D, et al. 2016. Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability. Sci Transl Med 8: 343ra81. doi:10.1126/scitranslmed.aad0917

Abstract/FREE Full Text

[64] ↵

Yassour M, Jason E, Hogstrom LJ, Arthur TD, Tripathi S, Siljander H, Selvenius J, Oikarinen S, Hyöty H, Virtanen SM, et al. 2018. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24: 146–154.e4. doi:10.1016/j.chom.2018.06.007

CrossRef Google Scholar

[65] ↵

Yin J, Zhang X-X, Wu B, Xian Q. 2015. Metagenomic insights into tetracycline effects on microbial community and antibiotic resistance of mouse gut. Ecotoxicology 24: 2125–2132. doi:10.1007/s10646-015-1540-7

CrossRef Google Scholar

[66] ↵

Zaura E, Brandt BW, de Mattos MJ T, Buijs MJ, Caspers MP, Rashid MU, Weintraub A, Nord CE, Savell A, Hu Y, et al. 2015. Same exposure but two radically different responses to antibiotics: resilience of the salivary microbiome versus long-term microbial shifts in feces. mBio 6: e01693-15. doi:10.1128/mBio.01693-15

Abstract/FREE Full Text

[67] ↵

Zhao S, Lieberman TD, Poyet M, Kauffman KM, Gibbons SM, Groussin M, Xavier RJ, Alm EJ. 2019. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25: 656–667.e8. doi:10.1016/j.chom.2019.03.007

CrossRef Google Scholar

[68] ↵

Zlitni S, Bishara A, Moss EL, Tkachenko E, Kang JB, Culver RN, Andermann TM, Weng Z, Wood C, Handy C, et al. 2020. Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale. Genome Med 12: 50. doi:10.1186/s13073-020-00747-0

CrossRef Google Scholar

Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment

Abstract

Results

Longitudinal linked-read sequencing of a human gut microbiome during disease and treatment

Deep longitudinal sequencing reveals shifts in the genetic composition of 36 species in the same host

Quantifying genetic linkage between SNVs using barcoded read clouds

Temporal dynamics of haplotypes reveal cryptic phenotypic differences and time-varying selection within species

Statistics of sweeping SNVs reveal strain replacement, evolutionary modification, and selection on standing variation

Discussion

Methods

Sample collection and sequencing

Estimating species abundances

Empirical estimates of read cloud impurity

Identifying SNVs within species

Identifying SNV differences over time

Linking SNV differences with their inferred species backbone

Quantifying genetic linkage between pairs of SNVs

Inferring clusters of linked SNVs from correlated allele frequency trajectories

Inferring natural selection and genetic drift from haplotype frequency trajectories

Quantifying retention of private marker SNVs over time

Software availability

Data access

Competing interest statement

Acknowledgments

Footnotes

References

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Share

Preprint Server

Navigate This Article

Current Issue

In This Issue