The GeDi
User’s Guide
GeDi 1.0.1
Compiled date: 2024-06-26
Last edited: 2024-02-29
License: MIT + file LICENSE
This vignette introduces the usage of the GeDi package for exploring the results of functional annotation and enrichment analyses.
GeDi is a versatile package designed to simplify the exploration and comprehension of functional annotation and enrichment analysis results. It offers a shiny application that combines interactivity, visualization, and reproducibility to consolidate comprehensive outcomes.
To incorporate GeDi into your workflow, you’ll need the results of a functional annotation or enrichment analysis. This vignette demonstrates the core functionalities of GeDi using a publicly available dataset from Alasoo et al., as described in their paper “Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response” (Alasoo et al. 2018).
Accessible through the macrophage Bioconductor package, this dataset comprises files generated from Salmon quantification (version 0.12.0, with Gencode v29 reference) and gene-level summarized values.
Within the macrophage experimental setup, samples derive from six different donors under four distinct conditions: naive, treated with Interferon gamma, with SL1344, or with a combination of Interferon gamma and SL1344. For illustration, we will focus on comparing Interferon gamma-treated samples with naive samples.
Before you can start using GeDi, the package needs to be installed on your machine. To install the package, begin by opening R and executing the following command:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("GeDi")
Once installed, the package can be loaded and attached to your current workspace as follows:
library("GeDi")
With the attached package, you can simply start the application by running
GeDi()
.
GeDi()
This action will open the application, directing you to the Welcome page. From there, you can easily provide your data using the Data Input panel on the left side menu, ensuring it’s in the correct format for analysis.
Alternatively, you can initiate the application by executing:
GeDi(
genesets = geneset_df,
ppi = ppi_df,
distance_scores = distance_scores_df
)
where
geneset_df
represents your input data in the form of a data.frame
, which
should include at least one column named “Genesets” containing geneset
identifiers and one column named “Genes” containing a comma-separated list of
genes belonging to each respective geneset.ppi_df
is another data.frame
containing protein-protein interaction scores,
with columns named “from”, “to”, and “combined_score”.distance_scores_df
is a sparse Matrix
containing the distance scores of
the genesets in your data.All of these parameters are optional, as you can alternatively upload, download, and compute them directly within the application. However, some of these processes may require a significant amount of time, especially with larger datasets. Therefore, it may be advantageous to save the intermediate results, such as the downloaded PPI and computed distance scores, for later use within the application.
In this vignette, we demonstrate the functionality of
GeDi
using enrichment analysis results from
the macrophage dataset. To immediately start exploring
the application, you can simply execute:
GeDi()
and load the example data with the Load example data
button in the
Data Input panel.
Alternatively, you can proceed by following the subsequent code chunks to create the necessary input objects, step by step. This can serve as a reference guide for the steps ideally executed prior to analyzing the data with GeDi.
To utilize GeDi, you’ll require results from a functional annotation analysis. In this vignette, we’ll demonstrate how to conduct an enrichment analysis on differentially expressed (DE) genes from the macrophage dataset.
Firstly, we’ll load the macrophage data and create a DESeqDataset
, as the
subsequent differential expression analysis will be performed using
DESeq2 (Love, Huber, and Anders 2014).
# Load required libraries
library("macrophage")
library("DESeq2")
# Load the example dataset "gse" from the "macrophage" package
data("gse", package = "macrophage")
# Create a DESeqDataSet object using the "gse" dataset and define the
# experimental design.
# We use the condition as part of the experimental design, because we are
# interested in the differentially expressed genes between treatments. We also
# add the line to the design to account for the inherent differences between
# the donors.
dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)
# Change the row names of the DESeqDataSet object to Ensembl IDs
rownames(dds_macrophage) <- gsub("\\..*", "", rownames(dds_macrophage))
# Have a look at the resulting DESeqDataSet object
dds_macrophage
#> class: DESeqDataSet
#> dim: 58294 24
#> metadata(7): tximetaInfo quantInfo ... txdbInfo version
#> assays(3): counts abundance avgTxLength
#> rownames(58294): ENSG00000000003 ENSG00000000005 ... ENSG00000285993 ENSG00000285994
#> rowData names(2): gene_id SYMBOL
#> colnames(24): SAMEA103885102 SAMEA103885347 ... SAMEA103885308 SAMEA103884949
#> colData names(15): names sample_id ... condition line
Now that we’ve obtained our DESeqDataset
, we can conduct the differential
expression (DE) analysis. In this vignette, we’ll utilize the results from
comparing two distinct conditions of the dataset, specifically IFNg
and
naive
, while accounting for the cell line of origin.
Before executing the DE analysis, we’ll filter out lowly expressed features from the dataset. In this instance, we’ll exclude all genes with fewer than 10 counts in at least 6 samples, where 6 corresponds to the smallest group size in the dataset.
Subsequently, we’ll conduct the DE analysis and assess against a null hypothesis of a log2FoldChange of 1 to ensure that we identify genes with consistent and robust changes in expression.
Finally, we’ll append the gene symbols to the resultant DataFrame
, which will
later serve as our “Genes” column in the input data for
GeDi.
# Filter genes based on read counts
# Calculate the number of genes with at least 10 counts in at least 6 samples
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
# Subset the DESeqDataSet object to keep only the selected genes
dds_macrophage <- dds_macrophage[keep, ]
# Have a look at the resulting DESeqDataSet object
dds_macrophage
#> class: DESeqDataSet
#> dim: 17806 24
#> metadata(7): tximetaInfo quantInfo ... txdbInfo version
#> assays(3): counts abundance avgTxLength
#> rownames(17806): ENSG00000000003 ENSG00000000419 ... ENSG00000285982 ENSG00000285994
#> rowData names(2): gene_id SYMBOL
#> colnames(24): SAMEA103885102 SAMEA103885347 ... SAMEA103885308 SAMEA103884949
#> colData names(15): names sample_id ... condition line
# Perform differential expression analysis using DESeq2
dds_macrophage <- DESeq(dds_macrophage)
# Extract differentially expressed genes
# Perform contrast analysis comparing "IFNg" condition to "naive" condition
# Set a log2 fold change threshold of 1 and a significance level (alpha) of 0.05
res_macrophage_IFNg_vs_naive <- results(dds_macrophage,
contrast = c("condition", "IFNg", "naive"),
lfcThreshold = 1, alpha = 0.05
)
# Add gene symbols to the results in a column "SYMBOL"
res_macrophage_IFNg_vs_naive$SYMBOL <- rowData(dds_macrophage)$SYMBOL
After completing the differential expression analysis, we move on to conduct the functional annotation analysis. To begin, we extract the differentially expressed (DE) genes from the previously generated results and identify the background genes to be utilized for functional enrichment.
For the enrichment analysis, we use the overrepresentation analysis method
provided by the topGO package. To streamline the
integration of these results into GeDi, we utilize the
topGOtable
function from the pcaExplorer package.
By default, this function employs the BP
ontology and the elim
method, which
helps decorrelate the Gene Ontology (GO) graph structure, resulting in less
redundant functional categories. The output is a DataFrame
object that
seamlessly integrates with GeDi.
However, as GeDi has only minimal requirements for the
input, enrichment results generated using clusterProfiler
can also be utilized. While we primarily tested results from the enrichGO
method during GeDi development, those from the
enrichKEGG
and enrichPathway
methods are also compatible.
# Load required packages for analysis
library("pcaExplorer")
library("GeneTonic")
library("AnnotationDbi")
# Extract gene symbols from the DESeq2 results object where FDR is below 0.05
# The function deseqresult2df is used to convert the DESeq2 results to a
# dataframe format
# FDR is set to 0.05 to filter significant results
de_symbols_IFNg_vs_naive <- deseqresult2df(res_macrophage_IFNg_vs_naive,
FDR = 0.05)$SYMBOL
# Extract gene symbols for background using the DESeq2 results object
# Filter genes that have nonzero counts
bg_ids <- rowData(dds_macrophage)$SYMBOL[rowSums(counts(dds_macrophage)) > 0]
# Load required package for analysis
library("topGO")
# Perform Gene Ontology enrichment analysis using the topGOtable function from
# the "pcaExplorer" package
macrophage_topGO_example <-
pcaExplorer::topGOtable(de_symbols_IFNg_vs_naive,
bg_ids,
ontology = "BP",
mapping = "org.Hs.eg.db",
geneID = "symbol",
topTablerows = 500
)
As mentioned earlier, GeDi expects the input to
contain at least two columns: one named “Genesets” and one named “Genes”. While
this is not strictly mandatory when providing your data interactively during an
application session, it becomes necessary if you intend to initiate the
application with your input as parameters (e.g.,
GeDi(genesets = my_genesets_df)
). In such cases, the “Genesets” column should
contain identifiers for each geneset in the input, while the “Genes” column
should consist of comma-separated lists of genes associated with each geneset.
Therefore, we will adjust the column names of the resulting data.frame
from
the enrichment analysis to adhere to the required format.
# Rename columns in the macrophage_topGO_example dataframe
# Change the column name "GO.ID" to "Genesets"
names(macrophage_topGO_example)[names(macrophage_topGO_example) == "GO.ID"] <- "Genesets"
# Change the column name "genes" to "Genes"
names(macrophage_topGO_example)[names(macrophage_topGO_example) == "genes"] <- "Genes"
Now that we’ve obtained functional annotation results from the
macrophage dataset, we can begin exploring the data
using GeDi. You have two options: you can either launch
the application and supply the generated data using the GeDi()
command, or if
you’ve followed this vignette, you can initiate the application directly with
the loaded data by executing GeDi(genesets = macrophage_topGO_example)
.
GeDi()
GeDi(genesets = macrophage_topGO_example)
The above shown code will open the application, directing you to the Welcome page. The Welcome page of GeDi serves as the entry point to the application, providing users with an overview of its features and functionalities. Upon launching the application, users are greeted with a user-friendly interface designed to facilitate the exploration and interpretation of functional annotation and enrichment analysis results. The Welcome page offers guidance on how to navigate the application and highlights key components such as data input options, visualization tools, and interactive features. Whether users are new to GeDi or returning to explore additional datasets, the Welcome page serves as a central hub for accessing resources and getting started with their analysis journey.
GeDi
user interfaceThe GeDi application, developed with the shiny framework, incorporates the modern design elements of the bs4Dash package, which is built upon Bootstrap 4. This combination of technologies ensures a sleek and visually appealing user interface for navigating and interacting with the functionality offered by GeDi. By leveraging the features of shiny and bs4Dash, GeDi provides users with an intuitive and aesthetically pleasing environment for conducting functional annotation and enrichment analyses on their datasets.
The structure of GeDi is designed around different panels, each of which becomes active upon clicking the corresponding icons or text in the sidebar.
While the Welcome panel is relatively self-explanatory, additional information and explanations are provided for the functionality of the remaining panels. For new users seeking guidance, there’s a question circle button available to initiate an interactive tour of GeDi. This tour allows users to learn the basic usage mechanisms by actively engaging with the interface. During the tour, specific elements are highlighted in response to user actions, while the rest of the UI remains shaded to maintain focus. Users can interrupt the tour at any time by clicking outside the highlighted window, and navigation between steps is facilitated by arrow buttons (left, right). The tour functionality is implemented using the rintrojs package.
GeDi
functionalityThe GeDi shiny application is organized into distinct panels, each serving a specific purpose, which will be thoroughly explored in the following sections.
This panel serves as a guide for utilizing GeDi effectively. It offers detailed instructions on generating input data for the application, elucidating the expected input format and outlining the various interactive elements present in the app’s other panels.