Yeast strains and growth conditions

Yeast strains used in this study are listed in Supplementary Table 4. Cells were grown in Yeast Extract-Peptone-Dextrose medium at 25 °C until early log phase and were then arrested in G1-phase for 170 min with 8 μg mL−1 α-factor. For exposure to Zeocin, cells were treated with 100 μg mL−1 Zeocin (Invivogen) for 1 h. The I-SceI strain was cultured in Yeast Extract-Peptone-Raffinose medium, galactose was added for 2 h to induce I-SceI cutting. For exposure to HU, cells were released from G1-phase arrest by addition of 75 μg mL−1 Pronase (Sigma) and 200 mM HU was added 20 min before Pronase release followed by 1 h incubation. Collected cells were washed with cold SE buffer (5 M NaCl, 500 mM EDTA pH 7.5) and immediately subjected to DSB labeling.

DIvA cell culture and induction of DSBs

DIvA cells are U2OS cell line, created by the Gaëlle Legube group, which express the restriction enzyme AsiSI fused to the modified estrogen receptor18. DIvA cells were cultured in Dulbecco’s modified Eagle’s medium supplemented with antibiotics, 10% fetal calf serum (Invitrogen), and 1 μg mL−1 puromycin at 37 °C under a humidified atmosphere with 5% CO218. The cell lines were regularly tested for mycoplasma contamination. To trigger nuclear localization of AsiSI and induce AsiSI-dependent breaks, DIvA cells were treated with 300 nM 4OHT for 4 h18. Untreated and 4OHT-treated DIvA cells were kindly provided by Gaëlle Legube. Collected cells were fixed with 2% formaldehyde for 30 min, washed with cold 1× phosphate-buffered saline (PBS) buffer and stored at 4 °C in 1× PBS with 0.05% NaN3 until they were subjected to gDNA sequencing and DSB labeling by BLESS.

i-BLESS labeling

Approximately 2.5 × 109 yeast cells were resuspended in 5 mL SE buffer and mixed with 5 mL 1 % Reducta agarose (Promega) in SE buffer at 40 °C. Cell suspension was mixed with 20 mL liquid paraffin (Merck Millipore) at 40 °C and vigorously shaken by hand for 1 min, until emulsion was formed. The emulsion was then poured into 200 mL ice-cold SE buffer and the mixture was stirred for several minutes. Agarose bead suspension was gently centrifuged (200 × g, 10 min), paraffin layer was removed, and agarose bead pellet was washed three times with TE buffer. β-Mercaptoethanol (0.5 mL), 20 µL of 200 U µL−1 lyticase solution (Sigma), and SE to a final volume of 10 mL was then added to the bead pellet, followed by 1 h incubation at 30 °C. Beads were washed with ES buffer (1% sarkosyl, 25 mM EDTA pH 8.0), resuspended in ES buffer with 50 µg mL−1 proteinase K (Sigma), and incubated overnight at 50 °C. After incubation, the beads were washed with TE + 0.1 mM phenylmethylsulfonyl fluoride (PMSF) and twice with TE. For samples treated with restriction enzymes, the beads were washed with appropriate buffer (FastDigest buffer (Thermo Scientific) or CutSmart buffer (NEB)) followed by treatment with restriction enzymes, as described below. For samples treated with Zeocin, beads were additionally washed with NEBNext® FFPE DNA buffer and subjected to reaction with NEBNext® FFPE DNA Repair Mix for 2 h at 20 °C. Next, the beads were washed with 1 × Blunting Buffer (NEB), followed by DNA ends blunting using Quick Blunting kit (NEB) for 2 h. The beads were subsequently washed with T4 ligation buffer and then resuspended in T4 ligation buffer with 100 nM proximal adapter. After 2 h, T4 ligase was added and the beads were incubated for up to 2 days at 16 °C. After ligation, the beads were washed once with TE and encapsulated DNA was initially sonicated using Covaris S220. Total DNA was isolated using Zymoclean™ Large Fragment DNA Recovery Kit (Zymo Research) and once again fragmented by sonication to create ~400 bp fragments. Labeled fragments were captured by Dynabeads MyOne C1 beads (Invitrogen), blunted, and phosphorylated using Quick Blunting Kit (NEB), then ligated to a distal adapter (both proximal and distal adapters are identical to those used in the original BLESS method3). The resulting circular DNA was then linearized by I-SceI (NEB) digestion and amplified by PCR. Purified PCR products were subsequently treated with XhoI (NEB) to cleave terminal I-SceI sequences derived from adapters.

BLESS labeling

Fixed DIvA cells were lysed in a Lysis buffer (10 mM Tris-HCl, 10 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.2 % NP-40 pH 8), for 60 min at 4 °C and then in a Nucleus break buffer (10 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.3% SDS pH 8), for 45 min at 37 °C. Lysed cells were resuspended in 1 × NEBuffer 2 (NEB) supplemented with 0.1% Triton X-100 and proteinase K at 100 μg mL−1 final concentration and incubated for 2 min at 37 °C. After that, samples were immediately transferred onto ice and an equal volume of buffer supplemented with PMSF was added to quench proteinase K.

Purified nuclei were washed twice with 1 × NEBuffer 2 supplemented with 0.1% Triton X-100 and then once with blunting buffer (NEB) supplemented with 100 μg mL−1 bovine serum albumin, followed by DNA ends blunting using Quick Blunting kit (NEB) for 45 min at room temperature. Afterwards, nuclei were washed twice with 1 × NEBuffer 2 supplemented with 0.1% Triton X-100, once with 1 × T4 ligase buffer supplemented with 0.1% Triton X-100 and once with 1 × T4 ligase buffer. Nuclei were resuspended in 1 × T4 ligase buffer with 2 µM proximal linker and in situ ligation was performed for 18–20 h at 16 °C using T4 ligase (NEB). After ligation, nuclei were washed three times with W&B buffer (5 mM Tris-HCl, 1 mM EDTA, 1 M NaCl pH 7.5, 0.1 % Triton X-100). Afterwards, gDNA was extracted by incubating nuclei in 1 × NEBuffer 2 with 0.5% Triton X-100 and proteinase K at 200 μg mL−1 final concentration for 1 h, shaking at 65 °C, followed by isopropanol-ethanol purification. Purified gDNA was sonicated using Covaris S220 to create ~400 bp fragments. Labeled fragments were captured by Dynabeads MyOne C1 beads (Invitrogen), blunted, and phosphorylated using Quick Blunting Kit (NEB), then ligated to a distal adapter. The resulting circular DNA was then linearized by I-SceI (NEB) digestion and amplified by PCR. Purified PCR products were subsequently treated with XhoI (NEB) to cleave terminal I-SceI sequences derived from adapters.

Library preparation and sequencing

Sequencing libraries for i-BLESS (yeast strains) or BLESS (DIvA cells) and respective gDNA samples were prepared using ThruPLEX DNA-seq Kit (Rubicon Genomics). i-BLESS or BLESS libraries were prepared without prior fragmentation and further size selection. Quality and quantity of the libraries were assessed on a 2100 Bioanalyzer using HS DNA Kit (Agilent) and on a Qubit 2.0 Fluorometer using Qubit dsDNA HS Assay Kit (Life Technologies). The libraries were sequenced (at least 2 × 70 bp) on Illumina HiSeq2000/2500/4000 platforms, according to our modified experimental and software protocols for generation of high-quality data from low-diversity samples35.

qDSB-Seq with in vitro digestion

In addition to DSB sequencing, as described above, a digestion with a restriction enzyme was performed after proteinase K treatment and before DSB labeling. Samples were treated with NotI (NEB, Thermo Scientific), SrfI (NEB), AsiSI (NEB), or BamHI (Thermo Scientific) for 1 h at 37 °C.

qDSB-Seq with I-SceI spike-in

For I-SceI spike-in we used a yeast strain (I-SceI strain) with GAL-inducible I-SceI endonuclease and a single I-SceI-cutting site integrated at the ADH4 locus on chromosome VII. To measure the cleavage efficiency of I-SceI, cell aliquots were taken before (RAFF) and 2 h after (GAL) cleavage induction, and total gDNA was extracted. DNA was serially diluted and amplified for 25 cycles with primers spanning the I-SceI cutting site. Cleavage efficiency was inferred by comparing the amount of amplified DNA in GAL (cut) vs. RAFF (uncut) conditions. We used CASY Cell Counter (Roche Applied Science) to mix this spike-in with our sample of interest (wild-type cells with replication stress induced by HU treatment) in proportion 2:98. The cutting ratio of the I-SceI endonuclease expressed in the I-SceI strain was estimated using an unmixed I-SceI strain and Equation (2) below.

Quantitative PCR

To validate cutting efficiency for NotI, input gDNA was analyzed by real-time PCR using primers flanking a selected NotI site at chrI: 114016–114023 (forward: 5′-AGAGTTGGGAATGTGTGCCC-3′, reverse: 5′-GGGCAGCAACACAAAGTGTC-3′) and KAPA SYBR® FAST kit (Life Technologies). Four technical replicates using two different concentrations of input DNA were performed. We compared the amount of PCR product amplified in untreated (U) vs. NotI-treated cells (N) by data analysis based on the ΔCT method36, where the ΔCT was obtained by subtraction of the threshold cycle CT in sample U from the CT in sample N: ΔCT = CT (N) − CT (U). Final cutting efficiency was calculated as mean efficiency for all dilutions according to the formula below:

$$f_{{\mathrm{cut}}} = 1 – \frac{1}{{2^{\Delta C_{\mathrm{T}}}}}.$$

(1)

We used calibration data to empirically correct ΔCT.

Sequencing data analysis

We used iSeq (http://breakome.eu/software.html) to ensure sequencing data quality before mapping. Next, iSeq was used to remove i-BLESS or BLESS proximal and distal barcodes (5′-TCGAGGTAGTA-3′ and 5′-TCGAGACGACG-3′, respectively). Reads labeled with the proximal barcode, which are directly adjacent to DSBs, were selected and mapped to the version of the yeast S288C genome sacCer3 (we manually corrected common polymorphisms) or to the human genome GRCh37 using bowtie37 v0.12.2 with the alignment parameters “-m1 –v1” (to exclude ambiguous mapping and low-quality reads). For ribosomal DNA mapping in RFB analysis, we mapped sequencing reads using the parameter “-v1” to allow multiple mapped reads. The end base pairs of the reads were trimmed using bowtie “−3” parameter. The parameter choice was based on the iSeq quality report. For calculation of the absolute number of DSBs per cell only mapped reads were retained. Further, the reads identified as originating from telomere ends were removed. For the yeast data, the telomeric reads were identified as those exhibiting the CAC motif in the whole AC-rich strand; regular expression C{0,3}AC{1,10} in the PERL language was used to identify them.

Calculation of DSB frequencies per cell

Paired-end sequencing of gDNA or qPCR was used to measure the cutting efficiency of an endonuclease. For an enzyme with a single cutting site (e.g., I-SceI), we used the following procedure to calculate cutting efficiency (fcut) from whole genome paired-end sequencing data:

$$f_{{\mathrm{cut}}} = \frac{{N_{{\mathrm{cut}}}}}{{N_{{\mathrm{cut}}} + 2N_{{\mathrm{uncut}}}}} – f_{{\mathrm{bg}}},$$

(2)

where Ncut is the number of fragments cut by an enzyme, Nuncut is the number of uncut fragments covering the cutting site, and fbg is the background level of breaks (e.g., resulting from sonication). Ncut fragments were counted in empirically determined, several nucleotide vicinities of the canonical cutting sites, based on visual examination of the read distribution. For enzymes with multiple cutting sites, reads mapped to each cutting site were first classified as cut or uncut and the results were summed over all cutting sites (Nsites):

$$f_{{\mathrm{cut}}} = \frac{{\mathop {\sum }\nolimits_{i = 1}^{N_{{\mathrm{sites}}}} N_{{\mathrm{cut}}}^i}}{{\mathop {\sum }\nolimits_{i = 1}^{N_{{\mathrm{sites}}}} N_{{\mathrm{cut}}}^i + 2\mathop {\sum }\nolimits_{i = 1}^{N_{{\mathrm{sites}}}} N_{{\mathrm{uncut}}}^i}} – f_{{\mathrm{bg}}}$$

(3)

To estimate cutting efficiency, we used only cutting sites to which >100 paired-end reads (in yeast) were mapped and their cutting efficiency was larger than 0. To estimate background break frequency, fbg, we randomly selected 1000–2000 genomic windows of the same size as those used to count cut and uncut fragments and estimated cutting efficiency in those intervals using the formula, \(f_{{\mathrm{bg}}} = \frac{{N_{{\mathrm{cut}}}}}{{N_{{\mathrm{cut}}} + 2N_{{\mathrm{uncut}}}}}\). For clarity, these errors are omitted in Equations (4) to (6).

Next, we calculated the number of spike-in DSBs induced at restriction sites, Bcut:

$$B_{{\mathrm{cut}}} = f_{{\mathrm{cut}}}N_{{\mathrm{sites}}}p,$$

(4)

where fcut is the cutting efficiency in undiluted samples, Nsites is the number of used enzyme restriction sites (e.g., 39 for NotI), and p is the proportion of digested cells (p = 1 unless mixing with an in vivo digested construct is used).

Then we computed the number of mapped sequencing reads per DSB or the coefficient, α:

$$\alpha = \frac{{R_{{\mathrm{cut}}}}}{{B_{{\mathrm{cut}}}}},$$

(5)

where Rcut is the number of labeled reads mapped to the cutting sites.

Finally, we computed studied DSBs per cell (Bstudied) using the following formula:

$$B_{{\mathrm{studied}}} = \frac{{R_{{\mathrm{studied}}}}}{\alpha }$$

(6)

where Bstudied is the number of studied DSBs per cell in the whole genome or in a specific region (e.g., a replication region), or at a specific location (e.g., an enzyme cutting site). In this study, we calculated the studied breaks per cell for the whole genome after subtracting reads generated from enzyme cutting sites, telomeres, and ribosomal DNA. Error of Bstudied is the SD of breaks calculated from different cutting sites for enzymes with multiple cutting sites (Supplementary Table 2). For example, for NotI, which has 39 cutting sites in the yeast strain we used, SD was calculated using results of DSB quantifications based on individual NotI cutting sites. By comparing with SD calculated from all cutting sites in different replicates, we concluded that SD calculated based on individual cutting sites is a conservative estimate of SD of Bstudied (Supplementary Table 2). For an enzyme with a single cutting site in a given genome (e.g., I-SceI strain) or when the signal from individual enzyme cutting sites was weak (DIvA cells), we estimated SD of fcut and used it to calculate SD of Bstudied. Namely, we estimated SD of fcut using the formula \(\sqrt {\sigma _{{\mathrm{bg}}}^2 + \sigma _{{\mathrm{Poisson}}}^2}\), where σbg is SD of fbg, calculated as above, and σPoisson is calculated assuming Poisson distribution of cut and uncut fragment counts (Ncut and Nuncut) and is approximated using the formula:

$$\frac{1}{2}\left(\frac{{N_{{\mathrm{cut}}} + \sqrt {N_{{\mathrm{cut}}}} }}{{(N_{{\mathrm{cut}}} + \sqrt {N_{{\mathrm{cut}}}} ) + 2(N_{{\mathrm{uncut}}} – \sqrt {N_{{\mathrm{uncut}}}} )}} – \frac{{N_{{\mathrm{cut}}} – \sqrt {N_{{\mathrm{cut}}}} }}{{(N_{{\mathrm{cut}}} – \sqrt {N_{{\mathrm{cut}}}} ) + 2(N_{{\mathrm{uncut}}} + \sqrt {N_{{\mathrm{uncut}}}} )}}\right).$$

(7)

Background estimation and removal

To quantify DSBs likely resulting from broken forks near origins, we first removed background not related to replication. To define such background, we calculated DSB density in a 500 bp sliding window with a 50 bp step, the peak of this distribution was assumed to be background DSB frequency. This background was subtracted from the data at each position, resulting negative values were assigned to zero.

Calculation of DSBs per cell in BLISS data

We downloaded BLISS data (NCBI SRA accession SRR544198013) from DIvA cells treated by 4OHT in the same manner as our gDNA and BLESS samples. We scanned the R1 reads in the BLISS data for 8 bp UMIs and the sample barcode 5′-CATCACGC-3′ at the beginning of the reads (i.e., for reads originating directly from DSBs) and retained the reads with both the UMI and the barcode. We trimmed the 16 bp prefix sequence from the R1 reads, the trimmed sequences were mapped to the human genome GRCh37 using bowtie37 v0.12.2 with the alignment parameters “-m1 –v1.” Next, using in-house PERL script, we counted the read depth of unique UMIs for each genomic position using the first nucleotide of each mapped reads (only one sequence with same UMIs at a genomic position was retained). Finally, the number of unique UMIs around AsiSI cutting sites was counted in a ±100 bp interval to calculate DSBs induced by AsiSI, as proposed by Iannelli et al.13. Reads from AsiSI sites closer than 200 bp were additionally processed to avoid double-counting, reads mapping to chrY sites were rejected since U2OS cells originate from a female. To calculate AsiSI-induced DSBs per cell, we divided the total number of unique UMIs originating from 1202 AsiSI sites by the reported number of cells used by Iannelli et al.13. SD for results of BLISS quantification was estimated assuming the Poisson distribution of UMI and cell counts and using the formula \(\frac{1}{2} \left( \frac{N_{\mathrm{UMI}} + \sqrt{N_{\mathrm{UMI}}} }{N_{\mathrm{cell}} – \sqrt{N_{\mathrm{cell}}}} – \frac{N_{\mathrm{UMI}} – \sqrt{N_{\mathrm{UMI}}}}{N_{\mathrm{cell}} + \sqrt{N_{\mathrm{cell}}}} \right)\), where NUMI is the number of UMI reads and Ncell is the number of cells used.

Analysis of fragile regions and enrichment

Hygestat_BLESS v1.2.3 in the iSeq package was used to identify fragile regions (i.e., regions with significant increase of the read numbers in treatment versus control samples), which were defined using the hypergeometric probability distribution and Benjamini–Hochberg correction. To evaluate the enrichment of fragile regions on nucleosomes, we used Hygestat_annotations v2.0, which computed the proportion of mappable nucleotides belonging to both the fragile regions and the nucleosomes, and the proportion of mappable nucleotides belonging to both genomic regions and the nucleosomes. To estimate the P-value for the feature enrichment inside fragile regions, we used 1000 permutations to calculate the empirical distribution of the ratio under the null hypothesis.

Identification of one-ended DSBs

To estimate the total number of one-ended DSBs, we performed hypergeometric test based on the number of i-BLESS sequencing reads from Watson and Crick strands using Hygestat_BLESS v1.2.3 in the iSeq package with a 500 nt window size. Regions with P < 1e − 10 for enrichment of either Watson or Crick strand reads were classified as one-ended DSB regions. The Bonferroni correction was used to compute P-values. The difference between numbers of reads from Watson and Crick was used to calculate the number of one-ended DSBs using the DSB quantification method described above.

Comparison of DSB levels between ZEO and G1 samples

We used read counts for 5000 nt mappable intervals produced by Hygestat_BLESS; ZEO read numbers were normalized using qDSB-Seq quantification. We evaluated the null hypothesis that the number of DSBs in G1-phase cells is the same or lower than in ZEO using very conservative 5 SD confidence intervals (assuming Poisson distribution of reads). All genomic windows with >17 reads in 5 kb were significantly enriched in DSBs in ZEO as compared with G1-phase cells (P < 2e – 12, calculated using the hypergeometric probability distribution and the Bonferroni correction).

DNA secondary structure prediction

DNA secondary structures were defined by free energy at 37 °C using UNAFold38 v3.8 in a 50 bp sliding window with a 25 bp step along the whole yeast genome.

Source