Basic usage

The CEMiTool R package provides users with an easy-to-use method to automatically run gene co-expression analyses. In addition, it performs gene set enrichment analysis and over representation analysis for the gene modules returned by the analysis.

For the most basic usage of CEMiTool only a data.frame containing expression data with gene symbols in the rows and sample names in the columns is needed, as following:

library("CEMiTool")
# load expression data
data(expr)
head(expr[,1:4])
##           X1913_d0  X1913_d3  X1913_d7  X1911_d0
## XIST     13.061894 13.290272 13.360468 13.178729
## DDX3Y     3.410819  3.164874  3.599792  3.400613
## RPS4Y1    6.326861  5.915121  6.341564  5.905167
## USP9Y     3.237749  3.362508  3.320674  3.365530
## CYorf15B  3.980988  4.201731  4.235020  4.046716
## EIF1AY    3.379857  3.229973  3.150274  3.196610

In this usage, the cemitool function receives the expression data, performs the co-expression modules analysis and returns a CEMiTool object:

cem <- cemitool(expr)

To see a summary of the slots inside the CEMiTool object we can use print

print(cem)
## CEMiTool Object
## - Number of modules: 4 
## - Modules:  (data.frame: 257x2): 
##   genes        modules
## 1  HBA1 Not.Correlated
## 2 RPS26 Not.Correlated
## 3   LYZ Not.Correlated
## - Expression file: data.frame with 4000 genes and 45 samples
## - Selected data: 257 genes selected
## - Gene Set Enrichment Analysis: null
## - Over Representation Analysis: null
## - Profile plot: null
## - Enrichment plot: null
## - ORA barplot: null
## - Beta x R2 plot: null
## - Mean connectivity plot: null
# or, more conveniently, just call `cem`
cem
## CEMiTool Object
## - Number of modules: 4 
## - Modules:  (data.frame: 257x2): 
##   genes        modules
## 1  HBA1 Not.Correlated
## 2 RPS26 Not.Correlated
## 3   LYZ Not.Correlated
## - Expression file: data.frame with 4000 genes and 45 samples
## - Selected data: 257 genes selected
## - Gene Set Enrichment Analysis: null
## - Over Representation Analysis: null
## - Profile plot: null
## - Enrichment plot: null
## - ORA barplot: null
## - Beta x R2 plot: null
## - Mean connectivity plot: null

The cemitool() function automatically executes some common analyses, depending on the input data. The following sections describes how to perform each of these analyses separately. Details on how to perform all analyses together are at the end of this vignette.

Gene filtering

As a default, the cemitool function first performs a filtering of the gene expression data before running the remaining analyses. This filtering is done in accordance to gene variance. In this example the filtering step has reduced the gene number to 257. To perform only the filtering step, the filter_expr(cem) function can be used.

Module inspection

The module analysis has produced 4 modules and the allocation of genes to modules can be seen with the module_genes function:

# inspect modules
nmodules(cem)
## [1] 4
head(module_genes(cem))
##       genes        modules
## 1      HBA1 Not.Correlated
## 2     RPS26 Not.Correlated
## 3       LYZ Not.Correlated
## 4      PPBP             M3
## 5 abParts35 Not.Correlated
## 6     NAMPT             M2

Genes that are allocated to Not.Correlated are genes that are not clustered into any module.

If you wish to adjust the module definition parameters of your CEMiTool object, use find_modules(cem).

You can use the get_hubs function to identify the top n genes with the highest connectivity in each module: hubs <- get_hubs(cem,n). A summary statistic of the expression data within each module (either the module mean or eigengene) can be obtained using: summary <- mod_summary(cem)

Generating reports

The information generated by CEMiTool, including tables and images can be accessed by generating a report of a the CEMiTool object:

generate_report(cem, output_format=c("pdf_document", "html_document"))

The report will written to the directory ./Reports.

Also, you can create tables with the analyses results using:

write_files(cem, directory="./Tables", force=TRUE)

Adding sample annotation

More information can be included in CEMiTool to build a more complete object and generate richer reports about the expression data. Sample annotation can be supplied in a data.frame that specifies a class for each sample. Classes can represent different conditions, phenotypes, cell lines, time points, etc. In this example, classes are defined as the time point at which the samples were collected.

# load your sample annotation data
data(sample_annot)
head(sample_annot)
##   SampleName Class
## 1   X1913_d0    g0
## 2   X1911_d0    g0
## 3   X1908_d0    g0
## 4   X1909_d0    g0
## 5   X1910_d0    g0
## 6   X1912_d0    g0

Now you can construct a CEMiTool object with both expression data and sample annotation:

# run cemitool with sample annotation
cem <- cemitool(expr,sample_annot)

The sample annotation of your CEMiTool object can be retrieved and reassigned using the sample_annotation(cem) function. This function can also be used to define the columns with sample names and sample groupings (which are “SampleName” and “Class”, by default):

sample_annotation(cem, 
                  sample_name_column="SampleName", 
                  class_column="Class") <- sample_annot

Module enrichment

When sample annotation is provided, the cemitool function will automatically evaluate how the modules are up or down regulated between classes. This is performed using the gene set enrichment analysis function from the fgsea package.

You can generate a plot of how the enrichment of the modules varies across classes with the plot_gsea function. The size and intensity of the circles in the figure correspond to the Normalised Enrichment Score (NES), which is the enrichment score for a module in each class normalised by the number of genes in the module. This analysis is automatically run by the cemitool function, but it can be independently run with the function mod_gsea(cem).

# generate heatmap of gene set enrichment analysis
cem <- mod_gsea(cem)
cem <- plot_gsea(cem)
show_plot(cem, "gsea")
## $enrichment_plot

Expression patterns in modules

You can generate a plot that displays the expression of each gene within a module using the plot_profile function:

# plot gene expression within each module
cem <- plot_profile(cem)
plots <- show_plot(cem, "profile")
plots[1]
## $M1