Interactive visualization of SummarizedExperiment objects with iSEE

This lab demonstrates how to use the iSEE package to create and configure interactive applications for the exploration of various types of genomics data sets (e.g., bulk and single-cell RNA-seq, CyTOF, gene expression microarray).

Preparation of example scRNAseq data

We use example data from the TENxPBMCData package. This package provides an R / Bioconductor resource for representing and manipulating different single-cell RNA-seq data sets profiling peripheral blood mononuclear cells (PBMC) generated by 10X Genomics (https://support.10xgenomics.com/single-cell-gene-expression/datasets).

The man page for the TENxPBMCData() function gives an idea of the datasets that are available from this package. It can be opened with the following command.

Here, we use the "pbmc3k" dataset, which contains gene expression profiles for 2,700 single peripheral blood mononuclear cells. The first time this dataset is loaded, this command downloads the dataset to a local cache, which takes some time, depending on the speed of your internet connection. Subsequent times, the same command loads the dataset directly from the local cache.

At this point we can inspect the dataset in the console.

The dataset is provided as an object of the SingleCellExperiment class. In particular, this summary view indicates that the following pieces of information are available:

  • An assay matrix named "counts"
  • Row names (i.e. genes) are Ensembl gene IDs
  • Row metadata include for each gene the official gene symbol, and the gene symbol used by the 10x CellRanger quantification pipeline
  • Column names (i.e., cells) are not initialized and left to NULL
  • Column metadata include diverse information for each cell, including the cell barcode ("Barcode") and the donor identifier ("Individual").

Note that a SingleCellExperiment object (or, more generally, any SummarizedExperiment object) like this one already contains sufficient information to launch an interactive application instance to visualize the available data and metadata, using the iSEE() function.

For the purpose of this workshop, we first apply some preprocessing to the SingleCellExperiment object, in order to populate it with more information that can be visualized with iSEE. The steps below roughly follow those outlined in the simpleSingleCell Bioconductor workflow.

We start by adding column names to the object, and use gene symbols instead of Ensembl IDs as row names. In the case where multiple Ensembl identifiers correspond to the same gene symbol, the scater::uniquifyFeatureNames function concatenates the Ensembl ID and the gene symbol in order to generate unique feature names.

Next, we use the scater package to calculate gene- and cell-level quality metrics. These metrics are added as columns to the rowData and colData slots of the SingleCellExperiment object, respectively.

We filter out a few cells with a large fraction of the counts coming from mitochondrial genes, since these may be damaged cells. Notice the reduced number of columns in the dataset below.

Next, we calculate size factors and normalized and log-transformed expression values, using the scran and scater packages. Note that it is typically recommended to pre-cluster the cells before computing the size factors, as follows:

However, for time reasons, we will skip the pre-clustering step in this workshop.

In order to extract the most informative genes, we first model the mean-variance trend and decompose the variance into biological and technical components.

Next, we apply Principal Components Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) to generate low-dimensional representations of the cells in our data set. These low-dimensional representations are added to the reducedDim slot of the SingleCellExperiment object.

Finally, we cluster the cells using a graph-based algorithm, and find ‘marker genes’ for each cluster as the genes that are significantly upregulated in the cluster compared to each of the other inferred clusters. The adjusted p-values from this test, for each cluster, are added to the rowData slot of the object.

This concludes the preparation of the data. We have now a SingleCellExperiment object that contains different types of abundance values, representations in reduced dimensions, as well as a range of row (feature) and column (cell) metadata.

We can launch an iSEE instance for exploring this data set using the iSEE() function:

The main argument to the iSEE() function is a SummarizedExperiment object, or an object of any class extending SummarizedExperiment (such as SingleCellExperiment, in this case). No other restrictions are made on the type of data stored in the object, and iSEE can be used for interactive visualization of many different types of data. It is also worth noting that for various types of data, Bioconductor packages provides functionality for directly importing quantifications generated by external software packages into a SummarizedExperiment object. For example, the DropletUtils package can read quantifications from the 10x Genomics CellRanger pipeline for single-cell RNA-seq data, and the tximeta package can be used to read data from transcript quantification pipelines into a SummarizedExperiment object.

Overview of the iSEE package

This section provides an overview of the graphical interface of iSEE applications. To follow along, make sure that you have launched an iSEE instance as described in the code block at the end of the previous section. Note that in the default configuration, the panels do not look exactly like the ones shown in the screenshots below. For example, data points are not immediately colored, and the default annotation variables displayed by each panel may differ. Later sections of this workshop demonstrate how to modify the content of the panels and how they are displayed.

Note that for simplicity, we typically refer to a SummarizedExperiment in this workshop; however, iSEE works seamlessly for objects of any class extending SummarizedExperiment as well (e.g., SingleCellExperiment, DESeqDataSet). That said, some types of panels – such as the Reduced dimension plot – are only available for objects that contain a reducedDim slot (in particular, SingleCellExperiment objects); the basic SummarizedExperiment class does not contain this slot. In this workshop, we refer to the rows of the SummarizedExperiment object as ‘features’ (these can be genes, transcripts, genomic regions, etc) and to the columns as ‘samples’ (which, in our example data set, are single cells).

The user interface

The iSEE user interface consists of a number of panels, each displaying the data provided in the SummarizedExperiment from a specific perspective. There are 8 standard panel types; 6 plot panels and 2 table panels, all showcased in the figure below. In the default configuration, one panel of each type is included when launching the iSEE user interface. However, users are free to rearrange, resize, remove or add panels freely, as described in a later section. We provide a brief overview of each panel type in the following subsections.

In addition to the 8 standard panel types, user can create custom panels (both plots and tables). The creation and configuration of custom panels is discussed later, in a dedicated section of this workshop.

Reduced dimension plot

The reduced dimension plot can display any reduced dimension representation that is present in the reducedDim slot of the SingleCellExperiment object.

Note that this slot is not defined for the base SummarizedExperiment class, in which case the user interface does not allow the inclusion of panels of this type.