1 Introduction

4way plots enable a comparison of the logFC values from two contrasts of differential gene expression (Friedman and Maniatis 2011). The gg4way package creates 4way plots using the ggplot2 framework and supports popular Bioconductor objects. The package also provides information about the correlation between contrasts and significant genes of interest.

2 Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("gg4way")

To install the development version directly from GitHub:

if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

remotes::install_github("ben-laufer/gg4way")

3 Quick start: limma

This example involves testing a popular RNA-seq dataset using limma-voom.

3.1 Prepare data

First the airway data package is loaded, gene symbols are added, and then for the purpose of this vignette only genes with symbols are kept.

library("airway")
data("airway")
se <- airway

library("org.Hs.eg.db")
rowData(se)$symbol <- mapIds(org.Hs.eg.db,
                             keys = rownames(se),
                             column = "SYMBOL",
                             keytype = "ENSEMBL")

rowData(se)$ID <- rownames(se)

se <- se[!is.na(rowData(se)$symbol)]

3.2 limma-voom

The output from limma::eBayes() and limma::treat() is supported; however, only the former is shown for this example.

library("edgeR")
library("limma")

dge <- se |>
    SE2DGEList()

design <- model.matrix(~ 0 + cell + dex, data = dge$samples)
colnames(design) <- gsub("cell", "", colnames(design))

contr.matrix <- makeContrasts(N61311 - N052611,
                              N061011 - N052611,
                              levels = c("N052611", "N061011",
                                         "N080611", "N61311",
                                         "dexuntrt"))

keep <- filterByExpr(dge, design)
dge <- dge[keep, ]

efit <- dge |>
    calcNormFactors() |>
    voom(design) |>
    lmFit(design) |>
    contrasts.fit(contrasts = contr.matrix) |>
    eBayes()

3.3 Plot

Finally, we create a 4way plot comparing the logFC for all genes in the two contrasts.

library("gg4way")

p1 <- efit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")

p1
gg4way from limma

Figure 1: gg4way from limma

The legend title at the bottom shows that there is a correlation of r = 0.43, which is exemplified by more shared DEGs (blue dots) going in the same direction (upper right and bottom left) than opposite direction (upper left and bottom right). The numbers in the plot give the totals for the different quadrants of the 4way plot. If you look at the bottom left quadrant, the blue text shows that there are 62 DEGs where N052611 has significantly increased expression relative to both N61311 and N061011. The red text shows that there are 238 DEGs where N052611 has significantly increased expression relative to N61311 only, while the green text shows that there are 41 DEGs where N052611 has significantly increased expression relative to N061011 only.

4 Add gene labels

4.1 Table for labels

The genes that are significant in both contrasts can be obtained in a table through getShared().

p1 |>
    getShared() |>
    head()
## # A tibble: 6 × 12
##   ID              symbol   `N61311 vs N052611 LogFC` `N061011 vs N052611 LogFC`
##   <chr>           <chr>                        <dbl>                      <dbl>
## 1 ENSG00000204941 PSG5                         -4.61                      -5.24
## 2 ENSG00000164308 ERAP2                         3.45                       3.77
## 3 ENSG00000018625 ATP1A2                       -2.81                      -2.17
## 4 ENSG00000262902 MTCO1P40                     -6.72                      -6.78
## 5 ENSG00000180914 OXTR                         -3.78                      -3.18
## 6 ENSG00000078018 MAP2                         -3.10                      -1.35
## # ℹ 8 more variables: `N61311 vs N052611 FDR` <dbl>,
## #   `N061011 vs N052611 FDR` <dbl>, `N61311 vs N052611 FDRpass` <lgl>,
## #   `N061011 vs N052611 FDRpass` <lgl>, `N61311 vs N052611 Direction` <chr>,
## #   `N061011 vs N052611 Direction` <chr>, Significant <fct>, alpha <dbl>

4.2 Plotting labels

Gene symbols can be added to the plot through the label argument. Setting it to TRUE will plot all the genes colored blue, while specific genes can be labelled by providing their symbol. Below, two of the genes from the above table are plotted.

efit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611",
           label = c("PSG5", "ERAP2"))
gg4way with labels

Figure 2: gg4way with labels

5 Advanced options

5.1 Axis titles

The axis titles can be further customized through ggplot2.

p1 +
    xlab(expression(atop(
        paste("Higher in Line 2" %<->% "Higher in Line 1"),
        paste("Line 1 vs 2 LogFC")))) +
    ylab(expression(atop(
        paste("Line 3 vs 2"),
        paste("Higher in Line 2" %<->% "Higher in Line 3"))))
gg4way with custom axis titles

Figure 3: gg4way with custom axis titles

5.2 Correlation only

The Pearson correlation coefficient can be obtained from getCor(). Advanced users can apply this in their own functions to compare across pairs in a heatmap.

p1 |>
    getCor()
## [1] 0.43

6 edgeR

In addition to the output of limma, the functions are also compatible with edgeR and DESeq2. If a user is starting here, they will first have to run the Prepare data and limma-voom subsections in the Quick start: limma section. The output from edgeR::glmQLFTest(), edgeR::glmTreat(), and edgeR::glmLRT() is supported.

library("purrr")

rfit <- dge |>
    calcNormFactors() |>
    estimateDisp() |>
    glmQLFit(design) 

rfit <- contr.matrix |> 
    colnames() |>
    set_names() |>
    map(~ rfit |>
            glmQLFTest(contrast = contr.matrix[,.x]))

rfit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from edgeR

Figure 4: gg4way from edgeR

7 DESeq2

If a user is starting here, they will first have to run the Prepare data subsection in the Quick start: limma section.

7.1 DESeq analysis

For the purpose of this vignette, we filter the object to remove the difference between the results name and contrast approaches shown below.

library("DESeq2")

ddsSE <- se |>
    DESeqDataSet(design = ~ cell + dex)

keep <- ddsSE |>
    counts() |>
    apply(1, function(gene) {
        all(gene != 0)
    })

ddsSE <- ddsSE[keep, ]

dds <- ddsSE |>
    DESeq()

7.2 Plot by results name

dds |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from DESeq2

Figure 5: gg4way from DESeq2

7.3 Plot by contrast

Although not shown, the same result as above can be obtained through the contrast argument of DESeq2::results(). This approach also allows the user to set the lfcThreshold in DESeq2::results().

list("N61311 vs N052611" = c("cell", "N61311", "N052611"),
     "N061011 vs N052611" = c("cell", "N061011", "N052611")) |>
    map(~ results(dds, contrast = .x)) |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")

7.4 Plot by lfcShrink

Finally, the output of DESeq2::lfcShrink() can also be plotted.

list("N61311 vs N052611" = c("cell", "N61311", "N052611"),
     "N061011 vs N052611" = c("cell", "N061011", "N052611")) |>
    map(~ dds |>
        results(contrast = .x) |>
        lfcShrink(dds, contrast = .x, res = _, type = "normal")) |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611",
           symbol = "ID",
           logFC = "log2FoldChange",
           FDR = "padj")
gg4way from DESeq2 lfcShrink

Figure 6: gg4way from DESeq2 lfcShrink

8 Other packages

gg4way is not limited to input from limma, edgeR, or DESeq2. It only requires a list of data frames (or tibbles) with columns containing the ID, LogFC, and FDR. The symbol column is optional and used for the plot labels. It enables cases where not every feature has a (unique) gene ID.

9 Package support

The Bioconductor support site is the preferred method to ask for help. Before posting, it’s recommended to check previous posts for the answer and look over the posting guide. For the post, it’s important to use the gg4way tag and provide both a minimal reproducible example and session information.

10 Session info

## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DESeq2_1.44.0               purrr_1.0.2                
##  [3] gg4way_1.2.0                ggplot2_3.5.1              
##  [5] edgeR_4.2.0                 limma_3.60.0               
##  [7] org.Hs.eg.db_3.19.1         AnnotationDbi_1.66.0       
##  [9] airway_1.23.0               SummarizedExperiment_1.34.0
## [11] Biobase_2.64.0              GenomicRanges_1.56.0       
## [13] GenomeInfoDb_1.40.0         IRanges_2.38.0             
## [15] S4Vectors_0.42.0            BiocGenerics_0.50.0        
## [17] MatrixGenerics_1.16.0       matrixStats_1.3.0          
## [19] BiocStyle_2.32.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        farver_2.1.1            dplyr_1.1.4            
##  [4] blob_1.2.4              Biostrings_2.72.0       fastmap_1.1.1          
##  [7] janitor_2.2.0           digest_0.6.35           timechange_0.3.0       
## [10] lifecycle_1.0.4         statmod_1.5.0           KEGGREST_1.44.0        
## [13] RSQLite_2.3.6           magrittr_2.0.3          compiler_4.4.0         
## [16] rlang_1.1.3             sass_0.4.9              tools_4.4.0            
## [19] utf8_1.2.4              yaml_2.3.8              knitr_1.46             
## [22] labeling_0.4.3          S4Arrays_1.4.0          bit_4.0.5              
## [25] DelayedArray_0.30.0     abind_1.4-5             BiocParallel_1.38.0    
## [28] withr_3.0.0             grid_4.4.0              fansi_1.0.6            
## [31] colorspace_2.1-0        scales_1.3.0            cli_3.6.2              
## [34] rmarkdown_2.26          crayon_1.5.2            generics_0.1.3         
## [37] httr_1.4.7              DBI_1.2.2               cachem_1.0.8           
## [40] stringr_1.5.1           splines_4.4.0           zlibbioc_1.50.0        
## [43] parallel_4.4.0          BiocManager_1.30.22     XVector_0.44.0         
## [46] vctrs_0.6.5             Matrix_1.7-0            jsonlite_1.8.8         
## [49] bookdown_0.39           ggrepel_0.9.5           bit64_4.0.5            
## [52] locfit_1.5-9.9          tidyr_1.3.1             jquerylib_0.1.4        
## [55] glue_1.7.0              codetools_0.2-20        stringi_1.8.3          
## [58] lubridate_1.9.3         gtable_0.3.5            UCSC.utils_1.0.0       
## [61] munsell_0.5.1           tibble_3.2.1            pillar_1.9.0           
## [64] htmltools_0.5.8.1       GenomeInfoDbData_1.2.12 R6_2.5.1               
## [67] evaluate_0.23           lattice_0.22-6          highr_0.10             
## [70] png_0.1-8               memoise_2.0.1           snakecase_0.11.1       
## [73] bslib_0.7.0             Rcpp_1.0.12             SparseArray_1.4.0      
## [76] xfun_0.43               pkgconfig_2.0.3

References

Friedman, B. A., and T. Maniatis. 2011. “ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data.” Genome Biol 12 (7): R69.