Compiled date: 2024-06-26

Last edited: 2024-02-29

License: MIT + file LICENSE


1 Introduction

This vignette introduces the usage of the GeDi package for exploring the results of functional annotation and enrichment analyses.

GeDi is a versatile package designed to simplify the exploration and comprehension of functional annotation and enrichment analysis results. It offers a shiny application that combines interactivity, visualization, and reproducibility to consolidate comprehensive outcomes.

To incorporate GeDi into your workflow, you’ll need the results of a functional annotation or enrichment analysis. This vignette demonstrates the core functionalities of GeDi using a publicly available dataset from Alasoo et al., as described in their paper “Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response” (Alasoo et al. 2018).

Accessible through the macrophage Bioconductor package, this dataset comprises files generated from Salmon quantification (version 0.12.0, with Gencode v29 reference) and gene-level summarized values.

Within the macrophage experimental setup, samples derive from six different donors under four distinct conditions: naive, treated with Interferon gamma, with SL1344, or with a combination of Interferon gamma and SL1344. For illustration, we will focus on comparing Interferon gamma-treated samples with naive samples.

2 Getting started

Before you can start using GeDi, the package needs to be installed on your machine. To install the package, begin by opening R and executing the following command:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("GeDi")

Once installed, the package can be loaded and attached to your current workspace as follows:

library("GeDi")

With the attached package, you can simply start the application by running GeDi().

GeDi()

This action will open the application, directing you to the Welcome page. From there, you can easily provide your data using the Data Input panel on the left side menu, ensuring it’s in the correct format for analysis.

Alternatively, you can initiate the application by executing:

GeDi(
  genesets = geneset_df,
  ppi = ppi_df,
  distance_scores = distance_scores_df
)

where

  • geneset_df represents your input data in the form of a data.frame, which should include at least one column named “Genesets” containing geneset identifiers and one column named “Genes” containing a comma-separated list of genes belonging to each respective geneset.
  • ppi_df is another data.frame containing protein-protein interaction scores, with columns named “from”, “to”, and “combined_score”.
  • distance_scores_df is a sparse Matrix containing the distance scores of the genesets in your data.

All of these parameters are optional, as you can alternatively upload, download, and compute them directly within the application. However, some of these processes may require a significant amount of time, especially with larger datasets. Therefore, it may be advantageous to save the intermediate results, such as the downloaded PPI and computed distance scores, for later use within the application.

In this vignette, we demonstrate the functionality of GeDi using enrichment analysis results from the macrophage dataset. To immediately start exploring the application, you can simply execute:

GeDi()

and load the example data with the Load example data button in the Data Input panel.

Alternatively, you can proceed by following the subsequent code chunks to create the necessary input objects, step by step. This can serve as a reference guide for the steps ideally executed prior to analyzing the data with GeDi.

To utilize GeDi, you’ll require results from a functional annotation analysis. In this vignette, we’ll demonstrate how to conduct an enrichment analysis on differentially expressed (DE) genes from the macrophage dataset.

Firstly, we’ll load the macrophage data and create a DESeqDataset, as the subsequent differential expression analysis will be performed using DESeq2 (Love, Huber, and Anders 2014).

# Load required libraries
library("macrophage")
library("DESeq2")

# Load the example dataset "gse" from the "macrophage" package
data("gse", package = "macrophage")

# Create a DESeqDataSet object using the "gse" dataset and define the 
# experimental design.
# We use the condition as part of the experimental design, because we are 
# interested in the differentially expressed genes between treatments. We also 
# add the line to the design to account for the inherent differences between 
# the donors.
dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)

# Change the row names of the DESeqDataSet object to Ensembl IDs
rownames(dds_macrophage) <- gsub("\\..*", "", rownames(dds_macrophage))

# Have a look at the resulting DESeqDataSet object
dds_macrophage
#> class: DESeqDataSet 
#> dim: 58294 24 
#> metadata(7): tximetaInfo quantInfo ... txdbInfo version
#> assays(3): counts abundance avgTxLength
#> rownames(58294): ENSG00000000003 ENSG00000000005 ... ENSG00000285993 ENSG00000285994
#> rowData names(2): gene_id SYMBOL
#> colnames(24): SAMEA103885102 SAMEA103885347 ... SAMEA103885308 SAMEA103884949
#> colData names(15): names sample_id ... condition line

Now that we’ve obtained our DESeqDataset, we can conduct the differential expression (DE) analysis. In this vignette, we’ll utilize the results from comparing two distinct conditions of the dataset, specifically IFNg and naive, while accounting for the cell line of origin.

Before executing the DE analysis, we’ll filter out lowly expressed features from the dataset. In this instance, we’ll exclude all genes with fewer than 10 counts in at least 6 samples, where 6 corresponds to the smallest group size in the dataset.

Subsequently, we’ll conduct the DE analysis and assess against a null hypothesis of a log2FoldChange of 1 to ensure that we identify genes with consistent and robust changes in expression.

Finally, we’ll append the gene symbols to the resultant DataFrame, which will later serve as our “Genes” column in the input data for GeDi.

# Filter genes based on read counts
# Calculate the number of genes with at least 10 counts in at least 6 samples
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6

# Subset the DESeqDataSet object to keep only the selected genes
dds_macrophage <- dds_macrophage[keep, ]

# Have a look at the resulting DESeqDataSet object
dds_macrophage
#> class: DESeqDataSet 
#> dim: 17806 24 
#> metadata(7): tximetaInfo quantInfo ... txdbInfo version
#> assays(3): counts abundance avgTxLength
#> rownames(17806): ENSG00000000003 ENSG00000000419 ... ENSG00000285982 ENSG00000285994
#> rowData names(2): gene_id SYMBOL
#> colnames(24): SAMEA103885102 SAMEA103885347 ... SAMEA103885308 SAMEA103884949
#> colData names(15): names sample_id ... condition line
# Perform differential expression analysis using DESeq2
dds_macrophage <- DESeq(dds_macrophage)

# Extract differentially expressed genes
# Perform contrast analysis comparing "IFNg" condition to "naive" condition
# Set a log2 fold change threshold of 1 and a significance level (alpha) of 0.05
res_macrophage_IFNg_vs_naive <- results(dds_macrophage,
  contrast = c("condition", "IFNg", "naive"),
  lfcThreshold = 1, alpha = 0.05
)

# Add gene symbols to the results in a column "SYMBOL"
res_macrophage_IFNg_vs_naive$SYMBOL <- rowData(dds_macrophage)$SYMBOL

After completing the differential expression analysis, we move on to conduct the functional annotation analysis. To begin, we extract the differentially expressed (DE) genes from the previously generated results and identify the background genes to be utilized for functional enrichment.

For the enrichment analysis, we use the overrepresentation analysis method provided by the topGO package. To streamline the integration of these results into GeDi, we utilize the topGOtable function from the pcaExplorer package. By default, this function employs the BP ontology and the elim method, which helps decorrelate the Gene Ontology (GO) graph structure, resulting in less redundant functional categories. The output is a DataFrame object that seamlessly integrates with GeDi.

However, as GeDi has only minimal requirements for the input, enrichment results generated using clusterProfiler can also be utilized. While we primarily tested results from the enrichGO method during GeDi development, those from the enrichKEGG and enrichPathway methods are also compatible.

# Load required packages for analysis
library("pcaExplorer")
library("GeneTonic")
library("AnnotationDbi")

# Extract gene symbols from the DESeq2 results object where FDR is below 0.05
# The function deseqresult2df is used to convert the DESeq2 results to a 
# dataframe format
# FDR is set to 0.05 to filter significant results
de_symbols_IFNg_vs_naive <- deseqresult2df(res_macrophage_IFNg_vs_naive,
                                           FDR = 0.05)$SYMBOL

# Extract gene symbols for background using the DESeq2 results object
# Filter genes that have nonzero counts
bg_ids <- rowData(dds_macrophage)$SYMBOL[rowSums(counts(dds_macrophage)) > 0]
# Load required package for analysis
library("topGO")

# Perform Gene Ontology enrichment analysis using the topGOtable function from 
# the "pcaExplorer" package
macrophage_topGO_example <-
  pcaExplorer::topGOtable(de_symbols_IFNg_vs_naive,
    bg_ids,
    ontology = "BP",
    mapping = "org.Hs.eg.db",
    geneID = "symbol",
    topTablerows = 500
  )

As mentioned earlier, GeDi expects the input to contain at least two columns: one named “Genesets” and one named “Genes”. While this is not strictly mandatory when providing your data interactively during an application session, it becomes necessary if you intend to initiate the application with your input as parameters (e.g., GeDi(genesets = my_genesets_df)). In such cases, the “Genesets” column should contain identifiers for each geneset in the input, while the “Genes” column should consist of comma-separated lists of genes associated with each geneset.

Therefore, we will adjust the column names of the resulting data.frame from the enrichment analysis to adhere to the required format.

# Rename columns in the macrophage_topGO_example dataframe
# Change the column name "GO.ID" to "Genesets"
names(macrophage_topGO_example)[names(macrophage_topGO_example) == "GO.ID"] <- "Genesets"

# Change the column name "genes" to "Genes"
names(macrophage_topGO_example)[names(macrophage_topGO_example) == "genes"] <- "Genes"

2.1 All set!

Now that we’ve obtained functional annotation results from the macrophage dataset, we can begin exploring the data using GeDi. You have two options: you can either launch the application and supply the generated data using the GeDi() command, or if you’ve followed this vignette, you can initiate the application directly with the loaded data by executing GeDi(genesets = macrophage_topGO_example).

GeDi()

GeDi(genesets = macrophage_topGO_example)

The above shown code will open the application, directing you to the Welcome page. The Welcome page of GeDi serves as the entry point to the application, providing users with an overview of its features and functionalities. Upon launching the application, users are greeted with a user-friendly interface designed to facilitate the exploration and interpretation of functional annotation and enrichment analysis results. The Welcome page offers guidance on how to navigate the application and highlights key components such as data input options, visualization tools, and interactive features. Whether users are new to GeDi or returning to explore additional datasets, the Welcome page serves as a central hub for accessing resources and getting started with their analysis journey.

3 Description of the GeDi user interface

The GeDi application, developed with the shiny framework, incorporates the modern design elements of the bs4Dash package, which is built upon Bootstrap 4. This combination of technologies ensures a sleek and visually appealing user interface for navigating and interacting with the functionality offered by GeDi. By leveraging the features of shiny and bs4Dash, GeDi provides users with an intuitive and aesthetically pleasing environment for conducting functional annotation and enrichment analyses on their datasets.

3.1 Header (navbar)

The dashboard navbar in GeDi, referred to as such in the bs4Dash framework, features a dropdown menu accessible by clicking on the respective “info” icon. The menu offers additional functionality through various buttons:

  • The open book icon - This option allows users to explore the GeDi vignette, either the version bundled with the package or the online version, providing detailed documentation and usage guidelines.
  • The information i cirle - Selecting this option displays information about the current session, presenting details such as the R environment and loaded packages, helpful for troubleshooting and debugging purposes.
  • The heart button - This button offers general information about GeDi, including links to its development version for contribution and guidelines on citing the tool in research publications.

Besides the two dropdown menus, users can also find the Bookmark button in the Navbar. The Bookmark button in the GeDi navbar serves as a convenient tool for users to save and bookmark genes and genesets of interest for later reference. To use this feature, users must first select or click on a gene or geneset that they wish to bookmark. Once the desired gene or geneset is selected, users can then click on the Bookmark button to add it to a list of bookmarked items within the GeDi application. This functionality enables users to organize and revisit specific genes or genesets that they find relevant or intriguing during their exploration of functional annotation and enrichment analysis results. The bookmarked genes and genesets can later be found in the Report panel.

3.3 Body

The structure of GeDi is designed around different panels, each of which becomes active upon clicking the corresponding icons or text in the sidebar.

While the Welcome panel is relatively self-explanatory, additional information and explanations are provided for the functionality of the remaining panels. For new users seeking guidance, there’s a question circle button available to initiate an interactive tour of GeDi. This tour allows users to learn the basic usage mechanisms by actively engaging with the interface. During the tour, specific elements are highlighted in response to user actions, while the rest of the UI remains shaded to maintain focus. Users can interrupt the tour at any time by clicking outside the highlighted window, and navigation between steps is facilitated by arrow buttons (left, right). The tour functionality is implemented using the rintrojs package.

4 The GeDi functionality

The GeDi shiny application is organized into distinct panels, each serving a specific purpose, which will be thoroughly explored in the following sections.

4.1 The Welcome panel

This panel serves as a guide for utilizing GeDi effectively. It offers detailed instructions on generating input data for the application, elucidating the expected input format and outlining the various interactive elements present in the app’s other panels.