1 Getting started

Rvisdiff is an R package distributed as part of the Bioconductor project. To install the package, start R and enter:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("Rvisdiff")

The GitHub repository for Rvisdiff is https://github.com/BioinfoUSAL/Rvisdiff. This is the place to file an issue, report a bug, or provide a pull request.

Once Rvisdiff is installed, it can be loaded by the following command.

library("Rvisdiff")

2 Introduction

Differential expression analysis generates a big report which needs a manual inspection for the optimization and interpretation of results. Researchers have designed visualization techniques to facilitate these tasks but their generation with code or statistics packages avoids the quick and massive exploration of results. We have designed Rvisdiff to integrate graphs in an easy to use and interactive web page.The user can explore the differential expression results and the source expression data in the same view.

As input data the package receives two tables with the differential expression results and the raw/normalized expression data. It detects the default output of DESeq2, edgeR and limma packages and no data conversion is needed. The user can also generate a custom data frame which integrates a statistical testing output with a fold change and mean calculation for each variable.

As output the package generates a local HTML page that can be seen in a Web browser. It is not necessary the installation of additional software such as application servers or programming languages. This feature ensures portability and ease of use. Moreover, results are stored in the local computer, avoiding any network sharing or data upload to external servers, which ensures the data privacy.

3 Input data

In this example we use as input the airway data package which contains the read counts in genes for an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone. The code below shows how to load the package and the data extraction of main data features that we need for the differential expression analysis and the posterior visualization with Rvisdiff. The countdata variable contains a data frame with the number of sequence counts for each gene (rows) and sample (columns). The coldata variable contains input phenotypes for the differential expression analysis and its posterior representation.

The following code loads the necessary libraries and formats the input sample conditions.

library(Rvisdiff)
library(airway)
data("airway")
se <- airway
se$dex <- relevel(se$dex, ref="untrt")
countdata <- assay(se)
coldata <- colData(se)

4 Generating the Report

4.1 Generating Report From DESeq2 results

The code below shows how to perform a differential expression analysis with DESeq2 and its representation with Rvisdiff.

library(DESeq2)
dds <- DESeqDataSet(se, design = ~ cell + dex)
dds <- DESeq(dds)
dres <- results(dds, independentFiltering = FALSE)
DEreport(dres, countdata, coldata$dex)

4.2 Generating Report From edgeR results

The code below shows how to perform a differential expression analysis with edgeR and its representation with Rvisdiff.

library(edgeR)
design <- model.matrix(~ cell + dex, data = coldata)
dl <- DGEList(counts = countdata, group = coldata$dex)
dl <- calcNormFactors(dl)
dl <- estimateDisp(dl, design=design)
de <- exactTest(dl,pair=1:2)
tt <- topTags(de, n = Inf, adjust.method = "BH", sort.by = "none")
DEreport(tt, countdata, coldata$dex) 

4.3 Generating Report From limma results

The code below shows how to perform a differential expression analysis with limma and its representation with Rvisdiff.

library(limma)
design <- model.matrix(~ 0 + dex + cell, data = coldata)
contr <- makeContrasts(dextrt - dexuntrt,levels=colnames(design))
limmaexprs <- voom(countdata, design)
fit <- lmFit(limmaexprs, design)
fit <- contrasts.fit(fit, contrasts=contr)
fit <- eBayes(fit)
limmares <- topTable(fit, coef = 1, number = Inf, sort.by = "none",
    adjust.method = "BH")
DEreport(limmares, countdata, coldata$dex) 

4.4 Generating Report From Differential test results

The code below shows how to perform a Wilcoxon test with expression data and its representation with Rvisdiff. This example can be also followed for the representation of resulting analysis from differential means tests.

untrt <- countdata[,coldata$dex=="untrt"]
trt <- countdata[,coldata$dex=="trt"]

library(matrixTests)
wilcox <- col_wilcoxon_twosample(t(untrt), t(trt))
stat <- wilcox$statistic
p <- wilcox$pvalue
log2FoldChange <- log2(rowMeans(trt)+1) - log2(rowMeans(untrt)+1)
wilcox <- cbind(stat = stat, pvalue = round(p, 6),
    padj = p.adjust(wilcox[,2], method = "BH"),
    baseMean = rowMeans(countdata),
    log2FoldChange = log2FoldChange)
rownames(wilcox) <- rownames(countdata)

DEreport(wilcox, countdata, coldata$dex)

5 Resulting Graphical User Interface

Figure 1 shows the resulting Web page generated with the DEreport function. The user can select which genes appear in the graphs selecting them in the results table. It contains the following graphs:

  • Volcano Plot: It is a scatter plot in which the values of rate of change are plotted in logarithmic scale (log2foldchange) versus the p-value resulting from the contrast test is scale minus logarithm 10 (-log10pvalue). Points are highlighted when the mouse is hovered over the results table. Variable name appears on point mouse over.
  • MA-Plot: is a scatter plot showing mean expression values versus rate of change, both are plotted in logarithmic scale to avoid excessive scatter. It has the same interactivity features as Volcano plots.
  • Line diagram: the gene expression levels (ordinates) in each sample (abscissae) are represented as a line. Diagram is divided based on input phenotype.
  • Box plot: they allow us to visualize the distribution, degree of asymmetry, extreme values and value of the median. It is also useful for comparing two distributions if we represent them in the same graph. The resulting graphs show the difference in expression between genes or conditions.
  • Cluster Heatmap: expression data are displayed in a grid where each row represents a gene and each column represents a sample. The color and intensity of the boxes are used to represent changes (usually scaled per gene, avoiding absolute values) in gene expression. The heatmap shows also a clustering tree that groups genes and samples based on the similarity of their gene expression pattern. The user can change the color scale and toggle rendering from raw to scaled values. Moreover, the graph provides a zoom feature which enables to set the focus on a set of samples or genes.