Description of the umbrella project of this study

This study and samples are part of the ENIGMA (Ecosystems Networks Integrated with Genes and Molecular Assemblies) project (, a multi-PIs DOE SFA (Department of Energy Science Focus Area). The ENIGMA field site has already been studied39 and a culture collection of isolates from this exact field site was made available to this study, it is referred as the “the culture collection” in the main text (Supplementary File).

Samples collection and incubation condition

Two 4 cm diameter sample soil cores were collected horizontally from Oak Ridge, TN (GPS 35.941133, −84.336504) on 24 January 2017 from a silt loam area. A vertical trench was made and a first core was taken at 30 cm depth while the second one was collected at 76 cm depth. Both cores were shipped cooled and where stored in the dark at 4 °C until processing. At the time of the experiment (within 1–3 months after collection) a piece of ~1 g of soil was sampled from the distal part of the core under sterile conditions for each replicate and placed into a 14 ml polystyrene dual positions snap cap that was kept in the upper position allowing gas exchange through the incubation. Each replicate was incubated with 2 ml of 50 µM L-homopropargylglycine (HPG, Click Chemistry Tools, Scottsdale, AZ, USA) in sterile water (DEPC diethyl pyrocarbonate treated filter sterilized water, pH 7) at 15 °C in the dark, no mixing procedure was applied (i.e., we did not make a soil slurry) and the headspace was 12 ml. This temperature was chosen because it is the average surface temperature at the field site. The incubation was done aerobically because data from the field indicate the soil is aerobic above 1 m in this area (T. Hazen, personal communication). Two milliliter was enough to fully submerge the 1 g of soil used for each replicate, this level of hydration ensured that all the soil pores were completely flooded and that there was no diffusion limitation of HPG. The addition of 2 ml diluted the soil solutes and might represent a field event corresponding to a heavy rain capable of flooding the soil. Control samples were incubated under the same conditions with water but without HPG (water –HPG control). The full incubation design can be found in Fig. 1a. At the end of the incubation period (spanning 0.5–168 h) 5 ml of 0.02% Tween® 20 (Sigma-Aldrich, ST Louis, MO, USA) in phosphate saline buffer (1X PBS) was added to each tube (already containing 2 ml of HPG solution and 1 g of soil) and further vortexed at maximum speed for 5 min (Vortex-Genie 2, Scientific Industries, Inc., Bohemia, NY, USA) in order to detach cells from the soil particles. Culture tubes were then centrifuged at 500 × g for 55 min (centrifuge 5810R, Eppendorf, Hamburg, Germany) and the supernatant containing the detached cells was aliquoted in 700 µl aliquots and frozen right away at −20 °C in 10% glycerol (Sigma-Aldrich, ST Louis, MO, USA) dissolved in PBS until further processing. The amount of supernatant collected per aliquot was chosen based on preliminary data that indicated that this amount was optimal to sort the target number of cells downstream.

Determining background fluorescent labeling in BONCAT

We performed a killed control experiments to validate the active incorporation of HPG and fluorescent labeling by cells by fixing duplicate soil samples from the 76 cm with 3% paraformaldehyde (PFA, Sigma-Aldrich, ST Louis MO, USA) for 1 h at RT. The aim of this experiment was to confirm that the cells needed to actively incorporate HPG to be labelled and that the simple diffusion of HPG into cells would not create artefactual signal25,26. Tests involving PFA were used as controls during methods development, and all sequencing results and were generated from unfixed sample.

We performed the PFA fixation (PFA, Sigma-Aldrich, ST Louis MO, USA) either prior incubation with HPG or right after. A set of samples was first fixed with 3% paraformaldehyde for 1 h at RT, while for another set PFA was spiked post incubation. The details of the incubation conditions can be found in Supplementary Table 2. These killed controls were compared to other live controls that were incubated without HPG in order to measure non-specific fluorescent labeling of cells. The killed controls and the no HPG controls went through the click chemistry reaction (see below) and their fluorescence in the BONCAT dye channel measured to determine the background fluorescence of the samples. Incubation times were 2 h and 48 h. This set of sample was handled as previously described, cells were detached from the soil and frozen stock in 10% glycerol were kept at −20 °C until further evaluation of HPG incorporation, see below.

Soil properties, mineral and organic composition of the soils

Bulk X-ray powder diffraction was used to analyze the mineralogical composition of the soils cores. Powdered samples were loaded on an autosampler in a Rigaku SmartLab X-ray diffractometer (Rigaku, The Woodlands, TX, USA), using a Bragg-Brentano geometry in a theta-theta configuration. Data were collected from 4° to 70° of 2θ, using Cu Kα radiation. After manual identification of the phases present, a Rietveld refinement was performed to obtain their weight fractions, using the software MAUD40.

The soil chemistry analyses were performed by the UC Davis Analytical lab ( Total carbon and total nitrogen were measure by the combustion method as described by the AOAC Official Method 272.43. The TOC was measured the same way after removal of carbonate via acid fumigation41. Soil nitrate and extractable ammonium where determined by the flow injection analyzer method42,43. The extractable phosphate (under detection limit of 1ppm for our samples) was measured by the Olsen-P method44, this method measures the bioavailable inorganic phosphate (orthophosphate).

Click reaction – BONCAT stain

A volume of 700 µl of frozen cells of each sample were allowed to thaw at 4 °C for ~1 h. In the meantime, the click-reaction mixture was prepared by mixing the dye premix with the reaction buffer. This premix consisted of 5 µl copper sulfate (CuSO4 100 µM final concentration), of 10 µl tris-hydroxypropyltriazolylmethylamine (THPTA, 500 µM final concentration), and of 3.3 µl (FAM picolyl azide dye, 5 µM final concentration). The mix was incubated 3 min in the dark before being mixed with the reaction buffer, which was made of 50 µl sodium ascorbate freshly prepared in 1X PBS at 5 mM final concentration and 50 µl of aminoguanidine HCl freshly prepared in 1X PBS at 5 mM final concentration and 880 µl of 1X PBS. All reagents were purchased from Click Chemistry Tools (Click Chemistry Tools, Scottsdale, AZ, USA). Once thawed, the cells were captured on a 0.2 µm GTTP isopore™ 25 mm diameter filter (MilliporeSigma, Burlington, MA, USA) and rinsed with 7 ml 1X PBS. The filter was then placed on a glass slide and 80 µl of the click reaction mixture was quickly added before covering the filter with a coverslip to avoid excess oxygen during the click reaction. The slides were incubated in the dark for 30 min and each filter was then thoroughly washed three times in a succession of three baths of 20 ml 1X PBS for 5 min each. The filters were finally transferred to 5 ml tubes (BD-Falcon 5 ml round bottom tube with snap cap, CorningTM, Corning, NY, USA) with 2 ml of 0.02% Tween® 20 in PBS, with the cells facing inwards and vortexed at maximum speed for 5 min to detach the cells. The tubes were incubated for 20 min at 25 °C, and subsequently stored at 4 °C. Before being loaded onto the cell sorter (BD-InfluxTM, BD Biosciences, San Jose, CA, USA), the samples were filtered through a 35 µm filter (BD-falcon 5 ml tube with cell strainer cap, CorningTM, Corning, NY, USA). Each set of experiment included water incubated samples (water –HPG control) that were clicked along with each set of samples, the fluorescence of the water incubated samples in the BONCAT dye channel was used to define the BONCAT staining background of each single click reaction.

Flow cytometer, cell count, and cell sorting

For the cell counts, the cells were prepared the exact same way as described above, but the click reaction was omitted and the cells detached from the soil were stained 1X SYBRTM (ThermoFisher Scientific, Invitrogen, Eugene OR, USA). For the evaluation of the BONCAT stained samples, cells were counterstained with the SYTOTM 59 (ThermoFisher Scientific, Invitrogen, Eugene OR, USA) DNA dye for 5 min at RT at 0.5 µM. The cell sorter (BD-InfluxTM, BD Biosciences, San Jose, CA, USA) was setup to capture the FAM picolyl azide dye (excitation = 490 nm/emission = 510 nm) in the green channel off a 488 nm blue laser and the counter DNA stain (excitation = 622 nm, emission = 645 nm) in the red channel off of a 630 nm red laser. A first gate was drawn on the SYTO positive (SYTO+) particles, under the assumption that this would capture the cells. SYTO+ events accounted for 0.1–5% of the events depending on the samples, most of the events being abiotic, most probably clays or other minerals (Supplementary Fig. 1). The BONCAT positive (BONCAT+) and BONCAT negative (BONCAT−) where further gated as a sub-fraction of the SYTO+ cells based on the BONCAT dye fluorescence. The no HPG control sample that went through click reaction steps along with the labeled samples was used to define the level of background BONCAT stain fluorescence, the BONCAT− gate was drawn under that line and BONCAT+ gate to ensure less than 0.5% false positives (Fig. 1b). The percent of BONCAT+ determined for a time course for both the 30 cm and the 76 cm sample guided the sorting decisions. We decided to sort three biological replicates at two incubation time points for the 76 cm sample (2 h and 48 h) and three biological replicates at one time point for the 30 cm sample (48 h). A total of 35–75 k cells (the target number was 75 k but some samples had too low cell counts or too low labelled cell counts, see Supplementary table 1 for detailed counts) were sorted in parallel for the BONCAT+ and BONCAT− gates into a 96 well plate. Plates were frozen at −80 °C until processing.

Total DNA extraction from soil and filters

In order to compare sorted cells to the soil microbiome, total purified DNA was prepared from the soil cores and the extractable cells captured on a 0.2 GTTP isopore™ 25 mm filter (MilliporeSigma, Burlington, MA, USA). We used the Qiagen-MoBio Power soil DNA kit (Qiagen, Hilden, Germany) following the manufacturer instructions, except for the lysis step that was performed by shaking the tubes at 30 Hz for 10 min in a tissue homogenizer (TissueLyser II, Qiagen, Hilden, Germany).

Libraries preparation and sequencing

In order to pellet the sorted cells, the 96 well plates were centrifuged at 7200 × g for 60 min at 10 °C. The plates were further centrifuged upside-down for 20 s at 60 × g to remove supernatant. The pelleted cells were lysed using PrepGEM (zyGEM, Charlottesville, VA, USA) chemical lysis in 2 µl reactions following manufacturer’s recommendation. 0.2 µl of 10X Green buffer, 0.02 µl of PrepGEM, 0.02 µl of lysozyme and 1.8 µl of water were added to each well. Note that six empty wells were submitted to PrepGEM lysis and library construction to account for potential contaminant. The plates were then placed in a thermocycler for 30 min at 37 °C and 30 min at 75 °C. The iTag PCR was performed directly on the cell lysate following the JGI standard operating protocol ( Briefly, the V4 region of the 16S rRNA gene was amplified using the universal primer set 515F (GTGYCAGCMGCCGCGGTAA), 806R (GGACTACNVGGGTWTCTAAT)45. The adapter sequences, linkers and barcode were on the reverse primer. The 16S rRNA gene PCR was performed in a final volume of 25 µl (10 µl of the 5 Prime master mix, 0.5 µl of the forward primer (at 10 µM), 1.5 µl of the reverse primer (at 3.3 µM), 0.44 µl of BSA, 10.5 µl of water and 2 µl of cell lysate). The PCR condition was as follows: after an initial denaturation step at 94 °C for 3 min, 30 PCR cycles occurred consisting on a 45 s denaturation step at 94 °C followed by a 1 min annealing step at 50 °C and a 1.5 min elongation step at 72 °C. A final elongation step of 10 min at 72 °C was further added to finish all incomplete target sequences. The V4 region of the 16S rRNA gene from the total DNA extracted from the soil and from the cells enriched on filters were also amplified using the same PCR condition. The PCR products were cleaned using the Agencourt AMpure XP beads solution (Beckman Coulter Life Sciences, Indianapolis, IN, USA) to remove excess primers and primer dimers. PCR products were incubated with 80% (v/v) beads for 5 min at 25 °C before being placed on a magnetic holder (MagWell™ Magnetic Separator 96, EdgeBio, San Jose, CA, USA). The supernatant was removed and the beads were washed with 70% v/v ethanol three times before being resuspended in 11 µl of water. The total DNA extracts were processed in parallel, the only difference being that the iTag PCR was performed in 50 µl final volume and the PCR product was resuspended in 16 µl water after the bead clean-up step. PCR products were run on a High Sensitivity DNA assay Bioanalyzer chip (2100 Bioanalyser, Agilent, Santa Clara, CA, USA) to confirm fragment size and concentration. PCR products were pooled to an equimolar concentration and run on the Illumina MiSeq platform (Illumina, San Diego, CA, USA). Sequences data have been archived under the Bioproject ID PRJNA475109 at the NCBI.

Sequences processing

The sequences were processed using Qiime2 v2017.946. The sequences were imported in qiime2 using the fastq manifest format. Sequences were further denoised, the primer trimmed (20 nucleotides from each side) and paired using DADA247 as implemented in the Qiime dada2 denoise-paired plug-in. This step also included a chimera check using the consensus method. The output was a table of 4063 features (also called exact sequence variant (ESV)) of 6,419,059 sequences. 130 features had at least one hit in one of the six no template controls and were not considered for further analysis. The filtered table had 6,110,776 sequences gathered into 3933 features with a median value of 205,167 sequences per sample. The features were further clustered into operational taxonomic units (OTUs) at a threshold of 97% similarity using the vsearch cluster-features-de-novo plug-in. The clustered OTU table had 1533 OTUs in total. The absolute number of OTUs in 16S rRNA genes analyses can vary by up to three orders of magnitude depending on the technique used48, DADA2 is known to return a more conservative number than the previously widely used upfront clustering methods by decreasing the number of false positives47. This relatively low OTU count is also consistent with the very low level of organics (carbon and nitrogen) in these soils, which total organic carbon (TOC) are comparable to un-colonized arid lands where microbial diversity is known to be reduced49. The taxonomy of the representative sequences was assigned using the feature-classifier classify-sklearn plug-in ( This classifier was trained on the Greengenes database 13_8 99% trimmed to the amplified region (V4 515F/806R). If the classifier could not assign the representative sequences at the phylum, then they were manually checked on the most up-to-date Silva SINA alignment service ( and the Silva classification was retained. The OTU table with assigned taxonomy was used to build the bar graph at the phylum level and all downstream analyses. Bray Curtis pairwise distance beta-diversity metric was computed on the OTU table and the obtained triangular distance matrix was ordinated using NMDS. The OTU table was further rarefied to an even sequence depth of 81,000, the rarefied OTU table was used to construct the rank-abundance plot. OTUs in each library were sorted according to their abundance using the average method where a group of similar values gets the average rank value for the group; the abundance was plotted in log scale against the log rank value in descending order.

Comparison with reference dataset

We compared our iTag data with the 697 full-length 16S rRNA gene of the ENIGMA Project’s existing culture collection from this field site and with the 511 16S rRNA gene sequences of the most abundant and widespread soil microbiome members, retrieved from Delgado-Baquerizo et al.29. We performed a nucleotide BLAST of one representative sequence per feature against the ENIGMA isolate database or the “511 most wanted soil phylotypes”29 database using Geneious R9©. A cutoff of >97% similarity was used to determine if a sequence from our dataset had a match in the ENIGMA isolate database and/or the “511 most wanted soil phylotypes” database.

LC-MS soil metabolomics

Triplicates of 2 g of soils from 30 cm and 70 cm were extracted using 8 ml of LCMS grade water and incubated 1 h on an overhead shaker at 4 °C. Aqueous extractable components were collected by removal of insoluble material with centrifugation at 3220 × g for 15 min at 4 °C, filtration of supernatants through a 0.45 µm PVDF syringe filter (MilliporeSigma, Burlington, MA, USA), followed by lyophilization of filtrates to remove water (Labconco 7670521, Kansas City, MO, USA). Dried samples were then resuspended in 500 µl of LCMS grade methanol, bath sonicated at 25 °C for 15 min, and then clarified by filtration through 0.2 µm PVDF microcentrifugal filtration devices (1000 × g, 2 min, 25 °C). Methanol extracts were spiked with an internal standard mix (13C,15N universally labeled amino acids, 767964, Sigma-Aldrich, USA, which included canonical amino acids, including methionine, at a final concentration of 10 µM each). Metabolites in extracts were chromatographically separated using hydrophilic liquid interaction chromatography on a SeQuant 5 µm, 150 × 2.1 mm, 200 Å zic-HILIC column (1.50454.0001, Millipore) and detected with a Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer equipped with a HESI-II source probe (ThermoFisher Scientific). Chromatographic separations were done by an Agilent 1290 series HPLC system, used with a column temperature at 40 °C, sample storage was set at 4 °C and injection volume at 6 µl. A gradient of mobile phase A (5 mM ammonium acetate in water) and B (5 mM ammonium acetate, 95% v/v acetonitrile in water) was used for metabolite retention and elution as follows: column equilibration at 0.45 mL4 5 ml min−1 in 100% B for 1.5 min, followed by a linear gradient at 0.45 5 ml min−1 to 35% A over 13.5 min, a linear gradient to 0.6 mL 5 ml min−1 and to 100% A over 3 min, a hold at 0.6 6 5 ml min−1 and 100% A for 5 min followed by a linear gradient to 0.45 5 ml min−1 and 100% B over 2 min and re-equilibration for an additional 7 min. Each sample was injected twice: once for analysis in positive ion mode and once for analysis in negative ion mode. The mass spectrometer source was set with a sheath gas flow of 55, aux gas flow of 20 and sweep gas flow of 2 (arbitrary units), spray voltage of |±3| kV, and capillary temperature of 400 °C. Ions were detected by the Q Exactive’s data dependent MS2 Top2 method, with the two highest abundance precursory ions (2.0 m/z isolation window, 17,500 resolution, 1e5 AGC target, 2.0 m/z isolation window, stepped normalized collisions energies of 10, 20 and 30 eV) selected from a full MS pre-scan (70–1050 m/z, 70,000 resolution, 3e6 AGC target, 100 ms maximum ion transmission) with dd settings at 1e3 minimum AGC target, charges excluded above |3| and a 10 s dynamic exclusion window. Internal and external standards were included for quality control purposes, with blank injections between every unique sample. QC mix was injected at the start and end of the injection sequence to ensure the stability of the signal through time and consisted of 30 compounds spanning a large range of m/z, RT and detectable in both positive and negative mode. Extracted ion chromatograms for internal standard compounds were evaluated using MZmine version 2.2650 to ensure consistency between injections. Samples were analyzed using Metabolite Atlas50 ( Briefly, a retention time corrected compound library generated by linear regression comparison of QC standards against an in house retention time (RT)-m/z-MSMS library of reference compounds analyzed using the same LCMS methods was used for compound identification in samples where measured RT, m/z and fragmentation spectra were compared with library predicted RT, theoretical m/z, library detected adducts and library MSMS fragmentation spectra. Compounds identification were retained when peak intensity was >1e4, retention time difference from predicted was <1 min, m/z was <20 ppm from theoretical, expected adduct was detected and at least one ion fragment matched the library spectra and were more abundant in at least one sample as compared to the average value + 1 SD of the extraction controls. Only eight compounds met these criteria; average peak heights from the extracted ion chromatograms are reported in Fig. S5. The signal was overall very low owing to the low amount of organics in these soils. We checked for the presence of methionine manually using MZmine version 2.2632 and confirmed that there were no detectable amount of methionine in any of the sample analyzed. Metabolomics data has been deposited JGI genome portal #1207416 along with the analysis file #1207417.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.