The R package MoleculeExperiment contains functions to create and work with objects from the new MoleculeExperiment class. We introduce this class for analysing molecule-based spatial transcriptomics data (e.g., Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others).
The goal of the MoleculeExperiment class is to:
1. Enable analysis of spatial transcriptomics (ST) data at the molecule level,
independent of aggregating to the cell or tissue level.
2. Standardise molecule-based ST data across vendors, to hopefully facilitate
comparison of different data sources and common analytical and visualisation
workflows.
3. Enable aggregation to a SpatialExperiment
object given combinations of
molecules and segmentation boundaries.
The latest release of MoleculeExperiment can be installed using:
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MoleculeExperiment")
library(MoleculeExperiment)
library(ggplot2)
library(EBImage)
repoDir <- system.file("extdata", package = "MoleculeExperiment")
repoDir <- paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")
me <- readXenium(repoDir, keepCols = "essential")
me
#> MoleculeExperiment class
#>
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (137): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (962)
#> ---- location range: [4900,4919.98] x [6400.02,6420]
#> -- sample2:
#> ---- features (143): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (777)
#> ---- location range: [4900.01,4919.98] x [6400.16,6419.97]
#>
#>
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (5): 67500 67512 67515 67521 67527
#> -- sample2:
#> ---- segments (9): 65043 65044 ... 65070 65071
ggplot_me() +
geom_polygon_me(me, assayName = "cell", fill = "grey") +
geom_point_me(me) +
# zoom in to selected patch area
coord_cartesian(
xlim = c(4900, 4919.98),
ylim = c(6400.02, 6420)
)
We can plot molecules only belonging to specific genes via the selectFeatures
argument.
ggplot_me() +
geom_polygon_me(me, assayName = "cell", fill = "grey") +
geom_point_me(me, byColour = "feature_id", selectFeatures = c("Nrn1", "Slc17a7")) +
# zoom in to selected patch area
coord_cartesian(
xlim = c(4900, 4919.98),
ylim = c(6400.02, 6420)
)
# transform ME to SPE object
spe <- countMolecules(me)
spe
#> class: SpatialExperiment
#> dim: 178 14
#> metadata(0):
#> assays(1): counts
#> rownames(178): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> rowData names(0):
#> colnames(14): sample1.67500 sample1.67512 ... sample2.65070
#> sample2.65071
#> colData names(4): sample_id cell_id x_location y_location
#> reducedDimNames(1): spatial
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x_location y_location
#> imgData names(0):
Here we demonstrate how to work with an ME object from toy data, representing a
scenario where both the detected transcripts information and the boundary
information have already been read into R. This requires the standardisation
of the data with the dataframeToMEList()
function.
The flexibility of the arguments in dataframeToMEList()
enable the creation of
a standard ME object across dataframes coming from different vendors of
molecule-based spatial transcriptomics technologies.
moleculesDf <- data.frame(
sample_id = rep(c("sample1", "sample2"), times = c(30, 20)),
features = rep(c("gene1", "gene2"), times = c(20, 30)),
x_coords = runif(50),
y_coords = runif(50)
)
head(moleculesDf)
#> sample_id features x_coords y_coords
#> 1 sample1 gene1 0.9014947 0.64662303
#> 2 sample1 gene1 0.1608592 0.79068051
#> 3 sample1 gene1 0.6828351 0.67090080
#> 4 sample1 gene1 0.9591983 0.72829616
#> 5 sample1 gene1 0.9345352 0.41894383
#> 6 sample1 gene1 0.5421612 0.02663181
boundariesDf <- data.frame(
sample_id = rep(c("sample1", "sample2"), times = c(16, 6)),
cell_id = rep(
c(
"cell1", "cell2", "cell3", "cell4",
"cell1", "cell2"
),
times = c(4, 4, 4, 4, 3, 3)
),
vertex_x = c(
0, 0.5, 0.5, 0,
0.5, 1, 1, 0.5,
0, 0.5, 0.5, 0,
0.5, 1, 1, 0.5,
0, 1, 0, 0, 1, 1
),
vertex_y = c(
0, 0, 0.5, 0.5,
0, 0, 0.5, 0.5,
0.5, 0.5, 1, 1,
0.5, 0.5, 1, 1,
0, 1, 1, 0, 0, 1
)
)
head(boundariesDf)
#> sample_id cell_id vertex_x vertex_y
#> 1 sample1 cell1 0.0 0.0
#> 2 sample1 cell1 0.5 0.0
#> 3 sample1 cell1 0.5 0.5
#> 4 sample1 cell1 0.0 0.5
#> 5 sample1 cell2 0.5 0.0
#> 6 sample1 cell2 1.0 0.0
moleculesMEList <- dataframeToMEList(moleculesDf,
dfType = "molecules",
assayName = "detected",
sampleCol = "sample_id",
factorCol = "features",
xCol = "x_coords",
yCol = "y_coords"
)
str(moleculesMEList, max.level = 3)
#> List of 1
#> $ detected:List of 2
#> ..$ sample1:List of 2
#> .. ..$ gene1: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
#> .. ..$ gene2: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ sample2:List of 1
#> .. ..$ gene2: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
boundariesMEList <- dataframeToMEList(boundariesDf,
dfType = "boundaries",
assayName = "cell",
sampleCol = "sample_id",
factorCol = "cell_id",
xCol = "vertex_x",
yCol = "vertex_y"
)
str(boundariesMEList, 3)
#> List of 1
#> $ cell:List of 2
#> ..$ sample1:List of 4
#> .. ..$ cell1: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> .. ..$ cell2: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> .. ..$ cell3: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> .. ..$ cell4: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ sample2:List of 2
#> .. ..$ cell1: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
#> .. ..$ cell2: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
toyME <- MoleculeExperiment(
molecules = moleculesMEList,
boundaries = boundariesMEList
)
toyME
#> MoleculeExperiment class
#>
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.01,0.96] x [0.03,0.99]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0,0.99] x [0.02,0.94]
#>
#>
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2
In this example, we use the extent of the molecules of generated for toyME
to
align the boundaries with the molecules. In general, the extent of the
segmentation is required for this alignment.
repoDir <- system.file("extdata", package = "MoleculeExperiment")
segMask <- paste0(repoDir, "/BIDcell_segmask.tif")
boundaries(toyME, "BIDcell_segmentation") <- readSegMask(
# use the molecule extent to define the boundary extent
extent(toyME, assayName = "detected"),
path = segMask, assayName = "BIDcell_segmentation",
sample_id = "sample1", background_value = 0
)
toyME
#> MoleculeExperiment class
#>
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.01,0.96] x [0.03,0.99]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0,0.99] x [0.02,0.94]
#>
#>
#> boundaries slot (2): cell BIDcell_segmentation
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2
#> - BIDcell_segmentation:
#> samples (1): sample1
#> -- sample1:
#> ---- segments (88): 54748 54771 ... 58184 58186
Displayed below is the BIDcell segmentation image added to toyME
as another boundaries
BIDcell_segmask_img <- EBImage::readImage(segMask)
EBImage::display(BIDcell_segmask_img, method = "raster")