Query option

In summary, two search categories are provided in KGD: gene search and batch query. The gene search option provides an interface for querying KGD with a gene ID or keyword associated with gene annotations. To facilitate the queries of genes and functional annotation data stored in KGD, we employed the Apache Solr search engine (http://lucene.apache.org/solr/) to build indexes for different sources of annotation information, including gene functions, GO terms, InterPro domains and homologs.

In addition to the gene search option under each genome page, a global search function is provided under the main menu of KGD. This function provides a quick query against all the records stored in the database and returns results in a tabular format including the gene ID, gene type, and gene description (Fig. 2a). From this table, users can browse the detailed feature page for each gene by clicking the corresponding gene link.

Fig. 2: Search functions in KGD.

figure2

a List of genes returned from a global search using a keyword. b Interface of the homology search (BLAST) implemented in KGD. c Result page of the homology search. The bottom image illustrates the alignment of query and subject sequences

The batch query option allows users to retrieve sequences, annotations and other types of information (e.g., TFs and TRs) for a given list of genes. The batch query function in KGD was modified from the ‘Sequence Retrieval’ page of Tripal16.

Homology search

To provide a homology search function, we implemented the Tripal BLAST UI extension module in KGD. All genome, mRNA, CDS and protein sequences of kiwifruit species stored in KGD are available for comparison through the BLAST program. To prevent users from selecting inompatible BLAST programs (BLASTN, BLASTP, BLASTX, tBLASTN and tBLASTX) for the corresponding databases, the list of BLAST programs is automatically set up according to the selected reference database (Fig. 2b). Options for filtering low-complexity sequences and selecting the maximum number of returned BLAST hits are provided. The BLAST function provides downloadable output files ordered by the expected values in three different formats, HTML, TSV and XML, and the results page lists all the hits, with each hit linked to a graphic output that shows the alignment coordinates between the query and the hit and a color-ranked bit score for the alignment (Fig. 2c).

Genome browser

In KGD, we implemented JBrowse30, a widely used genome browser, to display genome sequences, gene models, and expression profiles. Currently, all publicly available kiwifruit genomes, predicted gene models, and gene expression profiles derived from RNA-Seq data have been imported into JBrowse. The tracks of a given gene in a reference genome are also embedded in the gene features page to provide a graphical and informative view of its sequence and structure (Fig. 1a). Additionally, the genome browser can support other types of interesting data, such as single-base resolution genome variants, when they become available in the near future.

Synteny viewer

To view syntenic blocks and homologous gene pairs between different kiwifruit genome assemblies, we developed ‘SyntenyViewer’, an extension module of Tripal, in KGD. Syntenic blocks can be retrieved by selecting a query genome together with one or more subject genomes. ‘SyntenyViewer’ will draw circus plots to display syntenic blocks for every pair of query and subject genomes (Fig. 3a) and simultaneously generate a full list of the syntenic blocks. For a specific syntenic block, ‘SyntenyViewer’ creates an image to display the homologous gene pairs, and the view can be zoomed in or out as desired (Fig. 3b). The full list of genes included in the homologous gene pairs is provided with links to the detailed feature page of each gene (Fig. 1). In brief, the ‘SyntenyViewer’ module can not only reveal syntenic blocks between any two genome sequences but also connect homologous gene pairs in syntenic blocks. With this module, homologous members of interesting genes that are located in a specific region of one kiwifruit genome can be easily identified and intuitively viewed for the other kiwifruit genome.

Fig. 3: Genome synteny viewer in KGD.

figure3

a Syntenic blocks displayed in a Circos plot. The blue arc indicates the query chromosome, and the red arcs indicate the chromosomes of the compared genome. Gray lines between blue and red arcs indicate syntenic blocks identified between the two genomes. The lines of a syntenic block will become red when the user mouses over it. b Detailed view of a specific synteny block. The query and compared chromosomes of a specific synteny block are shown in orange and blue, respectively. The yellow and black lines within each chromosome indicate homologous gene pairs, which are connected by gray lines

Enrichment analysis

Large-scale genomic studies typically result in large lists of interesting genes. Interpreting such gene lists to obtain biologically meaningful information is the basic premise for understanding the underlying regulatory mechanisms of important biological processes and biochemical pathways. Enrichment analysis is a powerful and frequently used method for identifying specific families or groups of genes that are overrepresented in a list of biological entries (e.g., GO terms and biochemical pathways). We previously developed two custom-built extension modules of Tripal, ‘GO tool’ and ‘Pathway tool’, based on the hypergeometric test29. These two modules were also implemented in KGD to identify significantly enriched GO terms and biochemical pathways from a list of user-provided genes.

RNA-Seq expression analysis

KGD not only stores gene expression profiles derived from RNA-Seq datasets but also provides an ‘RNA-Seq’ module to allow users to perform RNA-Seq data analyses, including the identification of differentially expressed genes (DEGs) and the visualization of gene expression profiles. The two most popular DEG identification tools, edgeR31 and DESeq32, were integrated into the ‘RNA-Seq’ module in KGD. The tools provide users the option of selecting their desired cutoff values for the gene expression fold change and adjusted P-value to determine the final DEGs. The result page for the DEG analysis includes the project description, parameter settings, top 100 DEGs ordered by adjusted P-values, and a download link to a file with all identified DEGs together with their relevant information (Fig. 4a). Furthermore, the result page provides links to other modules for many downstream analyses of the identified DEGs, such as BLAST, batch query, GO term and pathway enrichment analyses, and gene functional classification.

Fig. 4: Gene expression analysis with the ‘RNA-Seq’ module in KGD.

figure4

a Statistical analysis results listing the top 100 DEGs ordered by adjusted p-values. b Heatmap showing the expression profiles of a list of user-defined genes. c Single-base resolution expression profile view in JBrowse

In addition to viewing the expression profiles of individual genes on the gene feature page (Fig. 1d), the ‘RNA-Seq’ module provides two interactive visualization tools: a heatmap tool developed using Plotly’s JavaScript library (http://plot.ly) for displaying the expression profiles of a set of genes (Fig. 4b) and an expression viewer embedded in JBrowse for displaying single-base resolution expression profiles under certain conditions (Fig. 4c).

Source