Sample collection

Breast tissue was collected from women undergoing breast surgery at Cork University Hospital, Cork, Ireland. Breast tumour core-biopsies were aseptically resected using an Achieve 14G Breast Biopsy System (Iskus Health, UT, USA). The specimens were transported in sterile PBS to the lab, where they were flash-frozen and kept at − 80 °C until further processing. DNA from the specimens was purified following the protocol and reagents provided in the Ultra Deep Microbiome Prep (Molzym, GmbH & Co. KG., Bremen, Germany) and eluted in 100 µl of Tris–HCl.

DNA purification

Samples were processed and DNA purified following the procedures specified in protocols listed in Table 1. In all cases, DNA was eluted in Tris–HCl buffer and stored at − 20 °C until further analysis.

Table 1 Samples and corresponding DNA extraction strategy.

16S rRNA gene sequencing library preparation

Genomic DNA was amplified by PCR with primers targeting the hypervariable V1–V2 region or the V3–V4 region of the 16S rRNA gene. Table 2 details the primers sequences (underlined) included for compatibility with the Illumina 16S Metagenomic Sequencing Protocol (Illumina, CA, USA).

Table 2 Primers used for 16S rRNA gene sequencing analysis.

For Breast Tumour and Normal Adjacent samples, amplification was performed in 50 µl reactions, containing 1X NEBNext High Fidelity 2X PCR Master Mix (NEB, USA), 0.5 µM of each primer, 8 µl template (5–15 ng/µl) and 12 µl nuclease free water. The thermal profile included an initial 98 °C × 30 s denaturation, followed by 25 cycles of denaturation at 98 °C × 10 s, annealing at 55 °C × 30 s for V3–V4 or 62 °C × 30 s for V1–V2 and extension at 72 °C × 30 s. Plus a final extension at 72 °C × 5 min. Amplification was confirmed by running 5 µl of PCR product on a 2% agarose gel, by visualisation of a ≈ 310 bp band for V1–V2 and ≈ 460 bp band for V3–V4.

Faecal microbial genomic DNA was amplified using Phusion High-Fidelity DNA Polymerases (Thermo Scientific, Massachusetts, USA) with the PCR thermocycler protocol as follows: Initiation step of 98 °C for 3 min followed by 25 cycles of 98 °C for 30 s, 55 °C for 60 s, and 72 °C for 20 s, and a final extension step of 72 °C for 5 min.

Oesophageal biopsies and skin swab samples microbial genomic DNA was amplified using MTP Taq DNA Polymerase (Merck KGaA, Darmstadt, Germany) with the PCR thermocycler protocol as follows: Initiation step of 94 °C for 1 min followed by 35 cycles of 94 °C for 60 s, 55 °C for 45 s, and 72 °C for 30 s, and a final extension step of 72 °C for 5 min.

An index PCR was performed to add sample specific DNA barcodes to sample amplicons in accordance with the Illumina 16S Metagenomic Sequencing Protocol (Illumina, California, USA)16. Libraries DNA concertation was quantified using a Qubit fluorometer (Invitrogen) using the ‘High Sensitivity’ assay and samples were pooled at a standardised concentration16. The pooled library was sequenced on the Illumina MiSeq platform (Illumina, California, USA) utilising 2 × 300 bp chemistry.

16S rRNA sequence analysis

The quality of the paired-end sequencing data was visualised using FastQC v (0.11.9), and trimmed using Trimmomatic v (0.39) ensuring a minimum average quality of 25. Reads were then imported into R environment v (3.6.3)17 to be resolved into Amplicon Sequence Variants by the DADA2 package v (1.12).

Contamination control

In all samples a contamination control strategy was implemented in keeping with the RIDE checklist as proposed by Eisenhofer et al.18, incorporating aseptic techniques and a variety of negative controls from different stages of the sample-to-sequence data process. Retrospective contamination assessment and removal based on sequencing data from negative controls was also performed following published guidelines19.

Retrospective bioinformatics based removal of human amplicons

Sequencing reads aligning to the human genome (GRCh38) within the fasta file generated by DADA2 were identified using bowtie220. To confirm reads mapped to the human genome were not erroneously aligned bacterial reads, all human aligning reads were classified with Mothur21, using the RDP database v (11.4) as a reference.

Statistical analysis and data visualisation

All statistical analysis was carried out in the R environment, using the following libraries: Phyloseq v (1.30), Vegan v (2.5.6), ggplot2 v (3.3.0), reshape2 v (1.4.3).

Ethical approval

All procedures in this study were performed in accordance to national ethical guidelines, following ethical approval from the University College Cork Clinical Research Committee.

Informed consent

Patients provided written informed consent for sample collection and subsequent analyses.