Morphological characteristics of walnut floral transition
Morphological differentiation of female flower buds in walnut was continuously observed by paraffin section. We divided walnut female flower bud differentiation into stages as follows: F_1 (predifferentiation period), F_2 (initial differentiation period) and F_3 (flower primordium differentiation period). In the predifferentiation period (F_1), the growth points of the flower bud appeared flat, and the bud scales appeared light green. In the initial differentiation period (F_2), the flower stalk primordium was raised, and the bud scales were yellow-green. In the flower primordium differentiation period (F_3), the primordium of the flower stalk continued to extend, the pistil primordium began to appear, and the bud scales appeared yellow-green (Fig. 1). The leaf buds (JRL) were collected at the same stage as F_2.
General analysis of walnut transcriptome data
Sequencing, splicing, and annotation of transcriptome data
A total of 194,926,208 raw reads were obtained through the sequencing of the transcript libraries (Table 1). By eliminating chimeric and low-quality reads, 189,041,550 clean reads were finally screened, and 132,154 unigenes were obtained with a mean length of 956 bp; the lengths of the unigenes ranged from 201 bp to 16831 bp (Fig. 2A).
All unigenes were compared with 7 databases (NR, NT, KEGG, Swiss-Prot, PFAM, GO, and KOG/COG) with an E value cutoff of e-5 for homology. Then, the unigenes were annotated according to the results: 39.46% of unigenes were annotated by the GO database, 20.51% of unigenes were annotated by the KO database, 13.77% of unigenes were annotated by the KOG database, 53.8% of unigenes were annotated by the NR database, 41.74% of unigenes were annotated by the NT database, 38.81% of unigenes were annotated by the PFAM database, and 39.6% of unigenes were annotated by the Swiss-Prot database. In total, 78,718 unigenes of the 132,154 total unigenes were annotated to at least one database, accounting for 59.56% of all unigenes, and 11,165 unigenes were annotated by all seven major databases, accounting for 8.44% of all unigenes (Fig. 2B).
The annotation results of the NR database show that 11372 unigenes (18.1% of the total annotated unigenes) were compared to Vitis vinifera, 6522 unigenes (10.4% of the total annotated unigenes) were compared to Prunus mume, 6141 unigenes (9.7% of the total annotated unigenes) were compared to Prunus persica, 4834 unigenes (7.7% of the total annotated unigenes) were compared to Jatropha curcas, 3457 unigenes (5.5% of the total annotated unigenes) were compared to Citrus sinensis, and the remaining 48.7% unigenes were compared to other species (Fig. 2C).
The gene functional classifications were identified in a GO enrichment analysis based on the 52,157 unigenes annotated in the GO database. In the biological process category, cellular process, metabolic process, and single organism process were highly represented. In the cellular component category, significantly enriched genes were associated with cell, cell part, and organelle. In the molecular function category, GO terms related to binding, catalytic activity and transporter activity were significantly enriched (Fig. 3). KEGG pathway enrichment. KEGG pathway enrichment analysis showed that transport and catabolism, signal transduction, transition, carbohydrate metabolism and environmental adaptation had the most unigenes in classifications of Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes and Organismal Systems, respectively (Fig. 4). In particular, Environmental Information Processing and Organismal Systems, which included many subterms associated with flowering, such as the pathways of plant hormone signal transduction (ko04075) and Circadian rhythm-plant (ko04712), were enriched (Table S1).
Transcription factor (TF) enrichment analysis
TFs function alone or with other proteins in complexes by promoting or blocking the recruitment of RNA polymerase to specific genes19,20,21. Some TFs have been shown to play key roles in plant flowering22,23,24. Among the 132,154 unigenes, 4,835 of the unigenes (approximately 3.7%) were annotated as TFs, and they fell into diverse categories covering nearly all TF families, with MYB, NAC, and AP2-EREBP being the most highly represented (Fig. 5).
DEGs analysis of walnut transcriptome data
Identification of DEGs
The four databases from F_1, F_2, F_3, and JRL were used for paired comparisons, and four combinations (F_1vsF_2, F_1vsF_3, F_2vsF_3, F_2vsJRL) were chosen and analyzed.
The three combinations of female flower bud developmental stages (F_1vsF_2, F_1vsF_3, F_2vsF_3) had 597 DEGs, 784 DEGs and 532 DEGs, respectively. In addition, 16 DEGs were shared by all three combinations, which means they were expressed differently in the three periods of female flower buds (Fig. 6). The expression levels of DEGs between F_1, F_2 and F_3 were plotted in a heatmap (Fig. 7A).
In addition, the female flower buds at the beginning of differentiation (F_2) and the leaf buds of the same stage (JRL) were used for comparison (F_2vsJRL). The results showed that 374 DEGs were expressed differently in female buds (F_2) and leaf buds (JRL), of which 90 unigenes in the female flower buds (F_2) showed a higher expression level than that in the leaf buds (JRL), and the other 284 unigenes showed the opposite trend (Fig. 6E). The expression levels of 374 DEGs between female buds (F_2) and leaf buds (JRL) were plotted in a heatmap (Fig. 7B).
GO functional analysis of DEGs
We conducted GO functional analysis with the upregulated and downregulated DEGs of F_1vsF_2, F_1vsF_3, and F_2vsF_3, and the results are shown in Fig. 8. The results indicated that the DEGs of the three stages of female flower bud differentiation had functions enriched in the regulation of RNA metabolic process (GO:0051252, BP); regulation of nucleobase-containing compound metabolic process (GO:0019219, BP); apoplast (GO:0048046, CC); nucleic acid binding transcription factor activity (GO:0001071, MF); transcription factor activity, sequence-specific DNA binding (GO:0003700, MF); and sequence-specific DNA binding (GO:0043565, MF). In addition, the GO functions of the DEGs between female flower buds (F_2) and leaf buds (JRL) were enriched in photosynthesis (GO:0015979, BP), photosystem I reaction center (GO:0009538, CC), photosystem I (GO:0009522, CC), photosystem (GO:0009521, CC), and thylakoid part (GO:0044436, CC).
KEGG pathway analysis of DEGs: KEGG pathway analysis of the DEGs of F_1VSF_2, F_1VSF_3, F_2VSF_3 showed that most pathways were associated with flowering, and three pathways (Photosynthesis-antenna proteins, Plant hormone signal transduction, Porphyrin and chlorophyll metabolism) were shared by the three combinations. In addition, the pathway enrichment analysis of female flower buds (F_2) and leaf buds (JRL) also included many flowering-related pathways (Fig. 9).
Analysis of DEGs related to flowering time: The flowering time genes that induce plant flowering are usually attributed to the photoperiod, vernalization, autonomous, GA, and age-regulated pathways, as well as the integrated factors of the intersection of these pathways3,6,13,25.
Based on previous studies, we summarized the important flowering time genes in the model plant Arabidopsis thaliana, and these genes were used as query genes (Table 2). Finally, we screened 31 DEGs associated with flowering time in walnut, and eighteen of the flowering time DEGs were chosen for qRT-PCR.
DEGs related to the walnut floral transition
A total of 31 unigenes were involved in the photoperiod (circadian rhythm), vernalization pathway, GA, and age-related pathways. To investigate the complex molecular mechanisms underlying floral transition in walnut, we set the 31 flowering time-related DEGs in the flowering time regulation network (Fig. 10).
Most of the DEGs were annotated in the photoperiod pathway: In total, twenty-four DEGs were annotated in this pathway, fifteen DEGs were highly expressed in the predifferentiation period (F_1), including CRY2 (1), FKF1 (1), GI (2), NF-YA (2), PIF3 (1), PPR5 (2), PRR7 (3), PRR9 (1), PRR73 (1), and PRR95 (1). CDF2 (1), CDF3 (1), and LHY (2) were highly expressed in the initial differentiation period (F_2). NF-YA (1) and TEM1 (2) were highly expressed in the flower primordium differentiation period (F_3). Two PHR2 genes had higher expression levels in leaf buds (JRL) than in flower buds (F_2), while PRR9 exhibited the opposite trend.
In Arabidopsis thaliana, CYCLING DOF FACTOR 2 (CDF2) and CYCLING DOF FACTOR 3 (CDF3) act redundantly to reduce CONSTANS (CO) expression and are degraded by the complex of GIGANTEA (GI) and FLAVIN-BINDING KELCH REPEAT F-BOX 1 (FKF1), thus releasing repression of CO and FT transcription26,27,28. Blue-light photoreceptors such as PHOTOLYASE-RELATED 2 (PHR2) are essential light detectors for the early development of plants and mediate phototropism and the expression of specific genes. Loss of a blue-light photoreceptor in the hy4 mutants of Arabidopsis thaliana substantially delayed flowering29,30. CRYPTOCHROME 2 (CRY2) can promote the expression of FT and is negatively regulated by FLOWERING LOCUS C (FLC)31. The NUCLEAR FACTOR Y (NF-Y) transcription factors (heterotrimeric complexes composed of NF-YA and the dimer of NF-YB/NF-YC) can initiate photoperiod-dependent flowering by cooperatively interacting with CO to drive the expression of FT32.
Phytochrome Interacting Factor 3 (PIF3) can bind the promoters of LATE ELONGATED HYPOCOTYL (LHY) and CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) to form an in vitro ternary complex. Additionally, the complex of CCA1 and LHY can bind with TIMING OF CAB 1 (TOC1) to form a feedback loop that is necessary for the circadian clock in Arabidopsis thaliana33. In addition, the CCA1 and LHY complexes were repressed by the Pseudo Response Regulators (PRRs) encoded by TOC16,34. The activator CO and the repressor TEMPRANILLO (TEM) have a quantitative balance that determines the FT transcription level, as it shifts the CO/TEM balance in favor of CO activity, allowing FT transcription to reach the threshold level required to trigger flowering35.
In the age-regulated pathway, three DEGs were annotated as TREHALOSE-6-PHOSPHATE SYNTHASE 1 (TPS1), which is essential for normal vegetative growth and transition to flowering36,37. They were differentially expressed between the different flower bud formation stages, and the flower buds showed no difference from the leaf buds in their expression level.
In the vernalization pathway, vernalization can cause epigenetic changes in FLC and indirectly promote flower formation by histone modification, and MADS AFFECTING FLOWERING 5 (MAF5) is an FLC-related gene38,39,40,41,42,43. VERNALIZATION INSENSITIVE 3-LIKE 2 (VIL2), which can repress MAF5 and permit more rapid flowering during noninductive photoperiod conditions in Arabidopsis thaliana44. In this study, VIL2 was highly expressed in the predifferentiation period (F_1), decreased in the initial differentiation period (F_2), and decreased to the lowest point at the flower primordium differentiation period (F_3). In the leaf buds (JRL), its expression level was higher than that in the flower buds during the same period (F_2).
In the GA pathway, GA regulates LEAFY (LFY) via DELLAs, miR159 and MYBs45,46,47. The combination of gibberellin and GIBBERELLIN-INSENSITIVE DWARF1 (GID1) can combine with the DELLA protein to form a GA-GID1-DELLA trimer, and then the SKP1-CUL1-F-box (SCF) polymer can tag the trimer to induce the ubiquitin 26 S proteasome to degrade the DELLA protein, relieving the inhibitory effect of the DELLA protein on plant growth and producing gibberellin effects48,49. In this study, GID1 was highly expressed in the predifferentiation period (F_1), and its expression was downregulated in the initial differentiation period (F_2). In addition, the expression of GID1 was higher in flower buds (F_2) than in leaf buds (JRL).
In addition to the DEGs mentioned above, there are two DEGs annotated as APETALA1 (AP1). In Arabidopsis thaliana, AP1 is required for the floral transition50. In this study, the two AP1 genes were nearly undetectable in the predifferentiation period (F_1), and their expression showed a significant upregulation in the initial differentiation period (F_2).
Weighted gene coexpression network analysis (WGCNA) is a biology method for interaction analysis and resolving correlation networks51. To search for the genes involved in flowering time regulation in walnut, thirty-one flowering time-related DEGs were used to construct a coexpression network using the WGCNA method, and the results are presented in Fig. 11. In the coexpression network, many of the hub genes that participate in flowering time regulation were identified, such as JrCDF-2, JrCDF-3, JrPRR7, JrPRR7-1, JrPRR7-2, JrTPS-1-1, and the hub genes with the highest edge numbers were JrCRY2 and JrNF-YA-2.
Verification of DEG expression by qRT-PCR
To further verify the gene expression levels shown by RNA-Seq, we chose eighteen flowering time-related DEGs to perform qRT-PCR. Except for JrTPS1-2, the expression trends of the DEGs showed high similarity between the qPCR data and the RNA-Seq data (Fig. 12).