Contents

1 MoleculeExperiment

The R package MoleculeExperiment contains functions to create and work with objects from the new MoleculeExperiment class. We introduce this class for analysing molecule-based spatial transcriptomics data (e.g., Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others).

1.1 Why the MoleculeExperiment class?

The goal of the MoleculeExperiment class is to: 1. Enable analysis of spatial transcriptomics (ST) data at the molecule level, independent of aggregating to the cell or tissue level. 2. Standardise molecule-based ST data across vendors, to hopefully facilitate comparison of different data sources and common analytical and visualisation workflows. 3. Enable aggregation to a SpatialExperiment object given combinations of molecules and segmentation boundaries.

1.2 Installation

The latest release of MoleculeExperiment can be installed using:

if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("MoleculeExperiment")

2 Minimal example

  1. Load required libraries.
library(MoleculeExperiment)
library(ggplot2)
library(EBImage)
  1. Create MoleculeExperiment object with example Xenium data, taken over a small patch.
repoDir <- system.file("extdata", package = "MoleculeExperiment")
repoDir <- paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")

me <- readXenium(repoDir, keepCols = "essential")
me
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (137): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (962)
#> ---- location range: [4900,4919.98] x [6400.02,6420]
#> -- sample2:
#> ---- features (143): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (777)
#> ---- location range: [4900.01,4919.98] x [6400.16,6419.97]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (5): 67500 67512 67515 67521 67527
#> -- sample2:
#> ---- segments (9): 65043 65044 ... 65070 65071
  1. Use standardised data in ME object for molecule-level analyses. For example, plot a simple digital in-situ, with cell boundaries overlaid.
ggplot_me() +
  geom_polygon_me(me, assayName = "cell", fill = "grey") +
  geom_point_me(me) +
  # zoom in to selected patch area
  coord_cartesian(
    xlim = c(4900, 4919.98),
    ylim = c(6400.02, 6420)
  )

We can plot molecules only belonging to specific genes via the selectFeatures argument.

ggplot_me() +
  geom_polygon_me(me, assayName = "cell", fill = "grey") +
  geom_point_me(me, byColour = "feature_id", selectFeatures = c("Nrn1", "Slc17a7")) +
  # zoom in to selected patch area
  coord_cartesian(
    xlim = c(4900, 4919.98),
    ylim = c(6400.02, 6420)
  )

  1. Finally, it is also possible to go from a MoleculeExperiment object to a SpatialExperiment object. This enables the transition from a molecule-level analysis to a cell-level analysis with already existing tools.
# transform ME to SPE object
spe <- countMolecules(me)
spe
#> class: SpatialExperiment 
#> dim: 178 14 
#> metadata(0):
#> assays(1): counts
#> rownames(178): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> rowData names(0):
#> colnames(14): sample1.67500 sample1.67512 ... sample2.65070
#>   sample2.65071
#> colData names(4): sample_id cell_id x_location y_location
#> reducedDimNames(1): spatial
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x_location y_location
#> imgData names(0):

3 The ME object in detail

3.1 Constructing an ME object

3.1.1 Use case 1: from dataframes to ME object

Here we demonstrate how to work with an ME object from toy data, representing a scenario where both the detected transcripts information and the boundary information have already been read into R. This requires the standardisation of the data with the dataframeToMEList() function.

The flexibility of the arguments in dataframeToMEList() enable the creation of a standard ME object across dataframes coming from different vendors of molecule-based spatial transcriptomics technologies.

  1. Generate a toy transcripts data.frame.
moleculesDf <- data.frame(
  sample_id = rep(c("sample1", "sample2"), times = c(30, 20)),
  features = rep(c("gene1", "gene2"), times = c(20, 30)),
  x_coords = runif(50),
  y_coords = runif(50)
)
head(moleculesDf)
#>   sample_id features  x_coords   y_coords
#> 1   sample1    gene1 0.9014947 0.64662303
#> 2   sample1    gene1 0.1608592 0.79068051
#> 3   sample1    gene1 0.6828351 0.67090080
#> 4   sample1    gene1 0.9591983 0.72829616
#> 5   sample1    gene1 0.9345352 0.41894383
#> 6   sample1    gene1 0.5421612 0.02663181
  1. Generate a toy boundaries data.frame.
boundariesDf <- data.frame(
  sample_id = rep(c("sample1", "sample2"), times = c(16, 6)),
  cell_id = rep(
    c(
      "cell1", "cell2", "cell3", "cell4",
      "cell1", "cell2"
    ),
    times = c(4, 4, 4, 4, 3, 3)
  ),
  vertex_x = c(
    0, 0.5, 0.5, 0,
    0.5, 1, 1, 0.5,
    0, 0.5, 0.5, 0,
    0.5, 1, 1, 0.5,
    0, 1, 0, 0, 1, 1
  ),
  vertex_y = c(
    0, 0, 0.5, 0.5,
    0, 0, 0.5, 0.5,
    0.5, 0.5, 1, 1,
    0.5, 0.5, 1, 1,
    0, 1, 1, 0, 0, 1
  )
)
head(boundariesDf)
#>   sample_id cell_id vertex_x vertex_y
#> 1   sample1   cell1      0.0      0.0
#> 2   sample1   cell1      0.5      0.0
#> 3   sample1   cell1      0.5      0.5
#> 4   sample1   cell1      0.0      0.5
#> 5   sample1   cell2      0.5      0.0
#> 6   sample1   cell2      1.0      0.0
  1. Standardise transcripts dataframe to ME list format.
moleculesMEList <- dataframeToMEList(moleculesDf,
  dfType = "molecules",
  assayName = "detected",
  sampleCol = "sample_id",
  factorCol = "features",
  xCol = "x_coords",
  yCol = "y_coords"
)
str(moleculesMEList, max.level = 3)
#> List of 1
#>  $ detected:List of 2
#>   ..$ sample1:List of 2
#>   .. ..$ gene1: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ gene2: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ sample2:List of 1
#>   .. ..$ gene2: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
  1. Standardise boundaries dataframe to ME list format.
boundariesMEList <- dataframeToMEList(boundariesDf,
  dfType = "boundaries",
  assayName = "cell",
  sampleCol = "sample_id",
  factorCol = "cell_id",
  xCol = "vertex_x",
  yCol = "vertex_y"
)
str(boundariesMEList, 3)
#> List of 1
#>  $ cell:List of 2
#>   ..$ sample1:List of 4
#>   .. ..$ cell1: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell2: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell3: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell4: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ sample2:List of 2
#>   .. ..$ cell1: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell2: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
  1. Create an ME object by using the MoleculeExperiment object constructor.
toyME <- MoleculeExperiment(
  molecules = moleculesMEList,
  boundaries = boundariesMEList
)
toyME
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.01,0.96] x [0.03,0.99]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0,0.99] x [0.02,0.94]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2
  1. Add boundaries from an external segmentation algorithm.

In this example, we use the extent of the molecules of generated for toyME to align the boundaries with the molecules. In general, the extent of the segmentation is required for this alignment.

repoDir <- system.file("extdata", package = "MoleculeExperiment")
segMask <- paste0(repoDir, "/BIDcell_segmask.tif")
boundaries(toyME, "BIDcell_segmentation") <- readSegMask(
  # use the molecule extent to define the boundary extent
  extent(toyME, assayName = "detected"),
  path = segMask, assayName = "BIDcell_segmentation",
  sample_id = "sample1", background_value = 0
)

toyME
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.01,0.96] x [0.03,0.99]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0,0.99] x [0.02,0.94]
#> 
#> 
#> boundaries slot (2): cell BIDcell_segmentation
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2
#> - BIDcell_segmentation:
#> samples (1): sample1
#> -- sample1:
#> ---- segments (88): 54748 54771 ... 58184 58186

Displayed below is the BIDcell segmentation image added to toyME as another boundaries

BIDcell_segmask_img <- EBImage::readImage(segMask)
EBImage::display(BIDcell_segmask_img, method = "raster")