MeRPy design and properties
MeRPy is a poly(acrylamide-co-acrylic acid)-graft-oligo(nucleic acid) copolymer (Fig. 1c). It can be selectively precipitated by the addition of methanol (Fig. 1a, b). The polymer’s carboxylate groups (1 wt%) are crucial to suppress nonspecific binding of free DNA. Grafted oligonucleotides serve as universal anchor strands that provide high binding capacity and activity. To define sequences for target capture and release, MeRPy is programmed with catcher strand probes that consist of three domains (Fig. 1d, Supplementary Fig. 1): (i) an adapter site, (ii) a target binding site, and (iii) an (optional) release site. After targets hybridize to the binding site, the polymer is precipitated. The pellet is then redispersed in water and targets are released, either non-selectively by thermal or basic denaturation, or selectively via toehold-mediated strand displacement16 (TMSD).
a Due to its methanol-responsiveness, MeRPy combines rapid binding/release in homogeneous phase with facile separation in heterogeneous phase. b MeRPy is soluble in aqueous buffers but precipitates in solutions containing ≥30 mM NaCl when methanol is added. The precipitate can be redissolved in water and triggered to release targets on demand. c Chemical structure and molecular weight distributions. Top: Molecular weight distributions of MeRPy-10 (m ~ 740, n ~ 74,000, o ~ 10) and MeRPy-100 (m ~ 900, n ~ 90,000, o ~ 130), obtained by AF4-LS. Bottom: Molecular weight distributions of MeRPy-10 as synthesized (control), after extensive vortexing and heating. d Scheme of MeRPy programming, target binding, and release.
We synthesized two variants of MeRPy (see Methods section and Supplementary Procedure 1): (i) MeRPy-10 carries ~10 anchor strands per polymer chain. It can bind up to 2 nmol ssDNA per milligram polymer. (ii) MeRPy-100 was synthesized for applications that demand increased binding capacity and kinetics. It is endowed with ~100 anchor strands per chain, providing 20 nmol hybridization sites per milligram polymer. Its binding capacity was found to be 15 nmol per milligram polymer, corresponding to ~75% of its theoretical limit (Supplementary Fig. 2). Both MeRPy variants have much higher binding capacities than widely used magnetic beads, which are limited by molecular crowding at the solid–liquid interface (typically max. 200–500 pmol ssDNA per milligram substrate) (see technical specifications of Thermo Scientific Dynabeads M-270, M-280, MyOne Streptavidin C1, and MyOne Streptavidin T1).
MeRPy-10 and MeRPy-100 are soluble in aqueous media. Methanol-induced precipitation requires prior adjustment of the ionic strength, as negative charges of the polymer’s carboxylate groups and anchor strands must be sufficiently shielded by counterions (Fig. 1b). MeRPy-10 requires 30–150 mM NaCl and 1 volume of methanol for complete precipitation. In contrast, MeRPy-100 requires 100–300 mM NaCl and 1.5 volumes of methanol for this process.
MeRPy’s high molecular weight is crucial for its robust and quantitative responsiveness. Asymmetrical flow-field flow fractionation measurements in combination with static and dynamic light scattering (AF4-LS) indicate that MeRPy-10 and MeRPy-100 have weight average molecular weights (Mw) of 5.73 and 8.47 MDa, respectively. (Fig. 1c, Supplementary Figs. 3 and 4, Supplementary Table 1, Supplementary Note 1). MeRPy chains are up to seven times heavier than chains produced in SNAPCAR experiments (Mw ~ 1.2 MDa)3. AF4-LS measurements further show that MeRPy-10 and MeRPy-100 assume a globular conformation in TE buffer at pH 8.0, as indicated by the scaling exponent (ν) of 0.38 and 0.32, respectively17. Gyration and hydrodynamic radii support this finding (Supplementary Table 1). The apparent volume (Vapp,h) occupied by individual MeRPy-10 (Vapp,h = 2.7 × 10−3 µm3) and MeRPy-100 molecules (Vh = 9.2 × 10−3 µm3) in solution comprises ≥99.5% water, thus leaving the polymer coils highly penetrable and anchor strands well accessible for hybridization.
We tested the structural stability of MeRPy under mechanical stress and at high temperature. MeRPy chains remain fully intact when vortexed for 30 min or heated to 95 °C for 5 min (Fig. 1c). These exposure times exceed those used in typical pulldown experiments (see below). Heating to 95 °C for 1 h lead to minor decomposition of the upper molecular weight fraction of MeRPy, and partial depurination of anchor strand bases can be expected to occur under these conditions18.
To demonstrate that MeRPy is applicable for the separation of DNA and DNA-labeled target molecules, we applied MeRPy-10 to a mixture of ssDNA-labeled cyanine dyes (T1 = Cy5; T2 = Cy3) (Fig. 2; Supplementary Procedure 2). The polymer was programmed with catcher strands targeting either of the two dyes. Fluorescence images show that the targeted dye was selectively pulled down, leaving non-targets in solution. After separating pellet from solution, the release of the captured target into a clean buffer was triggered by the addition of a release strand. A second MeRPy pulldown then removed the polymer from the released target.
a Schematic stepwise separation of two fluorescent dyes. T1: Cy5; T2: Cy3. Black anchors symbolize the methanol-responsive polymer backbone. b Photographs of tubes containing MeRPy-10 and a mixture of the dyes. MeRPy-10 was programmed to capture either T1 (upper path) or T2 (lower path). The target was then pulled down and isolated by the addition of methanol and a short spin-down. After separation, targets were released by TMSD.
As MeRPy provides high binding capacity, it can be used to capture many targets simultaneously at a fast rate. Figure 3 shows the manipulation of a 10-member ssDNA library (strands designated A–J) in the length range of 20–190 nt (Supplementary Table S2). Multiple targets were selected by the addition of catcher strand libraries (CSL) of different compositions (Fig. 3a, Supplementary Table 2). Pulldown and release efficiencies were quantified densitometrically via denaturing polyacrylamide gel electrophoresis (dPAGE). After short annealing of MeRPy-10 with the target mixture and CSL, the targeted members were depleted from the supernatant with 88 ± 4% efficiency. Ninety-eight percent of non-target strands remained in solution, on average. Nonspecific binding was undetectable for the majority of library components within the precision of the measurement (Supplementary Fig. 5). Low levels of nonspecific binding were consistently detected only for one library member (strand H).
a dPAGE of the library (L) before pulldown (black), after pulldown of selected strands (red), and after release of targeted library subsets (blue). Red and blue circles indicate strands that were targeted by catcher and release strands, respectively. γ = catcher strand band, δ = release strand band. The corresponding original uncropped gel scans are shown in Supplementary Fig. 11. b Average binding efficiency and specificity, as obtained by densitometric quantification of gel bands. Error bars indicate the standard deviation obtained from n independent measurements (target pulldown: n = 14; non-target pulldown: n = 26; target release: n = 21; non-target release: n = 39).
After redispersing the pellet in clean buffer solution, the selected library subsets were released via TMSD by addition of either all or only a subset of corresponding release strands (Supplementary Table 2). The release efficiency was 90 ± 12%, and the resulting sub-libraries contained the desired strands with a total yield of 79 ± 13%. The recovered target strands were free from non-target contaminations (including strand H) within the precision of the measurement (99.8 ± 0.5%). We attribute the high purity of the recovered DNA libraries to the dual selection of the combined target capture and release process: in the first step, the target binding sites select for correct target sequences. Most non-targets stay in the supernatant, but some nonspecific binding may occur. In the second step, TMSD applies another selection, this time requiring the correct release site sequences to unlock the desired targets under mild conditions. This process leaves residual non-specifically adsorbed DNA in the pellet.
Pulldown of dsDNA and enhancement of cDNA libraries
There is a high demand for tools that enable sequence-selective depletion of complementary DNA (cDNA) to enhance the efficiency and sensitivity in gene expression profiling via RNA-seq7,19. cDNA libraries are typically double-stranded (ds) DNA, which presents some challenges: dsDNA needs to be first denatured at high temperature to make its nucleobases available for binding. When the temperature is subsequently decreased, oligonucleotide capture probes may bind to the target. Yet, re-hybridization of target sense- and antisense strands can promote quick entropy-driven displacement of the probes, thus preventing their sustained attachment.
MeRPy-100 is uniquely suited to address this challenge: first, its high binding capacity enables catcher strand concentrations of up to 100 µM in ready-to-use MeRPy-100 solutions (~10×–100× higher than microbead-attached capture probes). This unusually high activity in a homogeneous solution provides favorable binding kinetics and helps out-compete target sense-antisense re-hybridization. Second, its high stability allows in situ denaturation and annealing in the presence of all necessary components. Third, MeRPy pulldown does not merely concentrate the targets towards the bottom of a tube, but it also encapsulates them within the polymer matrix. The encapsulation secures the capture process and prevents any premature release of targets.
To capture cDNA transcripts, we first generated target-specific CSLs. The CSLs were designed to tile large regions of the target transcripts, alternating between sense and antisense strands (Fig. 4a, e). This design was meant to achieve two goals: (i) efficiently blocking target sense- and antisense strands from re-binding to each other; and (ii) ensuring that not only full-length transcripts are captured, but also fragmented ones. We provide a Python script that allows quick generation of custom CSLs for any cDNA target (Supplementary Data 1–3).
a The target, a catcher strand library (CSL), and MeRPy are mixed and heated to 95 °C. Subsequently, the sample is cooled to bind catcher strands to the target and MeRPy. MeRPy is quickly precipitated to deplete the target from solution. b dPAGE after pulldown of a dsDNA target (150 bp) with MeRPy-100, showing high pulldown efficiency and specificity. The original uncropped scan of the gel is shown in Supplementary Fig. 12. c Selective depletion of high-abundance insulin (INS), glucagon (GCG), and transthyretin (TTR) cDNA from a clinical NGS library by MeRPy in the presence and absence of an INS-, GCG-, and TTR-targeting CSL. Blue and red data points represent genes with higher and lower transcripts per million (TPM) values, respectively, as compared to the original sample (untreated control). d Pulldown efficiencies for INS, TTR, and GCG, as quantified from RNA-seq (n = 3 independent experiments). e Relative base count after pulldown (blue trace) and standard deviation (n = 3 independent experiments, gray shade), as a function of base position in the transcripts (see Supplementary Note 2). The plot provides single-base-resolution information about depletion efficiencies. CSL-targeted transcript regions are highlighted in beige. Base positions (x-axis values) are relative to the center of the respective targeted region. Dashed lines mark exon boundaries. f Total number of genes detected with >1 TPM in the original sample vs. MeRPy-treated samples (n = 3 independent experiments).
The cDNA depletion procedure is simple and fast (Supplementary Procedure 3): (i) initial thermal denaturation of the sample in the presence of the CSL and MeRPy-100 (2 min at 95 °C), (ii) brief annealing (5 min at 20 °C), (iii) immediate MeRPy-100 pulldown and retrieval of the supernatant. Initial pulldown experiments with a 150-nt dsDNA mock target achieved consistently high capture efficiencies (89.8 ± 2.4%) without detectable levels of nonspecific binding (Fig. 4b). This value matches the performance characteristics in experiments with single-stranded targets (see above).
To demonstrate practical application of this method, we used MeRPy for targeted depletion of highly abundant insulin (INS), glucagon (GCG), and transthyretin (TTR) cDNA from clinical NGS libraries that had been generated from patient-derived pancreatic islets20. Owing to their high expression levels, these three genes consume a large fraction of NGS reads (Supplementary Fig. 6), thus reducing the sequencing depth for all other transcripts in the library, many of which carry diagnostically relevant information21.
The INS-, GCG-, and TTR-specific CSL contained 31 distinct catcher strands, each comprising a unique 38-nt target binding site and a 22-nt adapter site (Supplementary Table 3). The CSL targeted large regions (but not the entirety) of the three genes (Fig. 4e). Expectably, within the depleted genes, base positions that were located around the center of CSL-targeted regions (yellow regions in Fig. 4e) were most efficiently depleted, with corresponding base count reduction of 85, 96, and 94% for INS, TTR, and GCG, respectively. These values were in good agreement with the pulldown efficiency for the fully tiled dsDNA mock target. Regions within the same transcripts that were merely indirectly targeted by the CSL (being located upstream or downstream to a targeted region) were also depleted. However, the reduction in relative base count in these regions decayed with increasing distance to the targeted sites. This effect is not surprising, as cDNA libraries comprise a wide size distribution of fragments (150–700 nt), some of which did not contain any CSL-targeted sequences. Taken together, directly and indirectly targeted regions of the three genes were depleted with 60–80% efficiency (Fig. 4d).
Importantly, MeRPy pulldown did not introduce undesired biases to the expression profile, as evidenced by comparing the correlation of MeRPy-treated with untreated reference samples. Ninety-one percent of reads uniquely mapped to the human genome, independent of MeRPy treatment. Spearman and Pearson correlation coefficients were 0.939–0.953 and 0.979–0.981, respectively (Supplementary Fig. 7). These values are on par with the best performing commercial depletion assays for ribosomal RNA9. One unintended depletion event was detected for a non-target transcript. The outlier was identified as INS-IGF2 (Fig. 4c), a readthrough gene that shares the INS sequence, and which was hence captured by the INS-selective CSL.
Overall, the simultaneous depletion of INS, TTR, and GCG transcripts from pancreatic cDNA libraries made available reads for additional 327,000 transcripts per million (TPM) (Supplementary Fig. 6). As a result, the sequencing depth effectively increased for 92% of genes in the library (Fig. 4c). A net surplus of >1000 genes with TPM > 1 were detected in MeRPy-treated samples (Fig. 4f). A similar but less pronounced effect was observed when only one gene, INS, was depleted from a cDNA library containing ~10% INS transcripts (Supplementary Fig. 8). As before, high pulldown efficiency (~80%) and high selectivity were achieved. The depletion of INS alone increased the sequencing depth for 62% of genes in the library, and a net surplus of ~350 genes with TPM > 1 was detected in INS-depleted samples (Supplementary Figs. 8–10).