BayesSpace supports three ways of loading a SingleCellExperiment for analysis.
Visium datasets processed with Space Ranger can be loaded directly via the readVisium() function. This function takes only the path to the Space Ranger output directory (containing the spatial/ and filtered_feature_bc_matrix/ subdirectories) and returns a SingleCellExperiment.
Second, all datasets analyzed for the BayesSpace manuscript are readily accessible via the getRDS() function. This function takes two arguments - the name of the dataset, and the name of the sample in the dataset.
Finally, SingleCellExperiment objects can be constructed manually from a counts matrix and tables of row and column data. BayesSpace only requires that spot array coordinates be provided as columns named array_row and array_col in colData. (Note that enhancement of Visium datasets additionally requires the pixel coordinates of each spot in the tissue image, but in this case the dataset should be loaded with readVisium(), which loads these data automatically.)
library(Matrix)
rowData <- read.csv("path/to/rowData.csv", stringsAsFactors=FALSE)
colData <- read.csv("path/to/colData.csv", stringsAsFactors=FALSE, row.names=1)
counts <- read.csv("path/to/counts.csv.gz",
row.names=1, check.names=F, stringsAsFactors=FALSE))
sce <- SingleCellExperiment(assays=list(counts=as(counts, "dgCMatrix")),
rowData=rowData,
colData=colData)We’ll continue with the melanoma sample from the 2018 Spatial Transcriptomics paper for the remaining examples in this vignette.
BayesSpace requires minimal data pre-processing, but we provide a helper function to automate it.
spatialPreprocess() log-normalizes the count matrix and performs PCA on the top n.HVGs highly variable genes, keeping the top n.PCs principal components. Additionally, the spatial sequencing platform is added as metadata in the SingleCellExperiment for downstream analyses. If you do not wish to rerun PCA, running spatialPreprocess() with the flag skip.PCA=TRUE will only add the metadata BayesSpace requires.
We can use the qTune() and qPlot() functions to help choose q, the number of clusters to use in our analysis.
qTune() runs the BayesSpace clustering algorithm for multiple specified values of q (by default, 3 through 7) and computes their average pseudo-log-likelihood. It accepts any arguments to spatialCluster().qPlot() plots the pseudo-log-likelihood as a function of q; we suggest choosing a q around the elbow of this plot.The spatialCluster() function clusters the spots, and adds the predicted cluster labels to the SingleCellExperiment. Typically, as we did for the analyses in the paper, we suggest running with at least 10,000 iterations (nrep=10000), but we use 2,000 iteration in this demonstration for the sake of runtime. (Note that a random seed must be set in order for the results to be reproducible.)
set.seed(149)
melanoma <- spatialCluster(melanoma, q=4, platform="ST", d=7,
init.method="mclust", model="t", gamma=2,
nrep=2000, burn.in=100,
save.chain=TRUE)Both the mclust initialization (cluster.init) and the BayesSpace cluster assignments (spatial.cluster) are now available in the SingleCellExperiment’s colData.
head(colData(melanoma))
#> DataFrame with 6 rows and 9 columns
#> barcode in_tissue array_row array_col sizeFactor spot.idx
#> <character> <logical> <numeric> <numeric> <numeric> <integer>
#> 7x15 7x15 TRUE 7 15 0.795588 1
#> 7x16 7x16 TRUE 7 16 0.307304 2
#> 7x17 7x17 TRUE 7 17 0.331247 3
#> 7x18 7x18 TRUE 7 18 0.420747 4
#> 8x13 8x13 TRUE 8 13 0.255453 5
#> 8x14 8x14 TRUE 8 14 1.473439 6
#> spot.neighbors cluster.init spatial.cluster
#> <character> <numeric> <numeric>
#> 7x15 2,7 1 1
#> 7x16 1,3,8 1 1
#> 7x17 2,4,9 2 2
#> 7x18 3,10 3 2
#> 8x13 6,12 1 1
#> 8x14 5,7,13 1 1We can plot the cluster assignments over the spatial locations of the spots with clusterPlot().
As clusterPlot() returns a ggplot object, it can be customized by composing with familiar ggplot2 functions. Additionally, the argument palette sets the colors used for each cluster, and clusterPlot() takes additional arguments to geom_polygon() such as size or color to control the aesthetics of the spot borders.
clusterPlot(melanoma, palette=c("purple", "red", "blue", "yellow"), color="black") +
theme_bw() +
xlab("Column") +
ylab("Row") +
labs(fill="BayesSpace\ncluster", title="Spatial clustering of ST_mel1_rep2")The spatialEnhance() function will enhance the resolution of the principal components, and add these PCs as well as predicted cluster labels at subspot resolution to a new SingleCellExperiment. As with our demonstration of spatialCluster() above, we are using fewer iterations for the purpose of this example (nrep=2000) than we recommend in practice (nrep=100000 or greater).
Note that the jitter_scale parameter should be tuned so that proposals for updating subspot-level expression are accepted around 30% of the time. This can be evaluated using mcmcChain(melanoma.enhanced, "Ychange"), where the chain should stabilize to 0.25-0.40. Typically 1000-2500 iterations are sufficient to evaluate if jitter_scale should be increased if acceptance is too high or decreased if acceptance is too low.
We have implemented another way to avoid manually tuning jitter_scale, which uses adaptive MCMC. To use this feature, simply set jitter.scale=0. BayesSpace will automatically tune this parameter to achieve the best acceptance rate. Note that such tuning by default only happens in the burn-in phase.
After tuning, proceed to a full run of spatialEnhance with more iterations.
melanoma.enhanced <- spatialEnhance(melanoma, q=4, platform="ST", d=7,
model="t", gamma=2,
jitter.prior=0.3, jitter.scale=0,
nrep=2000, burn.in=100,
save.chain=TRUE, cores=2)The enhanced SingleCellExperiment includes an index to the parent spot in the original sce (spot.idx), along with an index to the subspot. It adds the offsets to the original spot coordinates, and provides the enhanced cluster label (spatial.cluster).
head(colData(melanoma.enhanced))
#> DataFrame with 6 rows and 9 columns
#> spot.idx spot.neighbors subspot.idx subspot.neighbors spot.row
#> <numeric> <character> <integer> <character> <numeric>
#> subspot_1.1 1 2,7 1 294,880 7
#> subspot_2.1 2 1,3,8 1 295,587,881 7
#> subspot_3.1 3 2,4,9 1 296,588,882 7
#> subspot_4.1 4 3,10 1 297,589,883 7
#> subspot_5.1 5 6,12 1 298,884 8
#> subspot_6.1 6 5,7,13 1 299,591,885 8
#> spot.col array_row array_col spatial.cluster
#> <numeric> <numeric> <numeric> <numeric>
#> subspot_1.1 15 6.66667 14.6667 1
#> subspot_2.1 16 6.66667 15.6667 1
#> subspot_3.1 17 6.66667 16.6667 2
#> subspot_4.1 18 6.66667 17.6667 2
#> subspot_5.1 13 7.66667 12.6667 1
#> subspot_6.1 14 7.66667 13.6667 1We can plot the enhanced cluster assignments as above.
BayesSpace operates on the principal components of the gene expression matrix, and spatialEnhance() therefore computes enhanced resolution PC vectors. Enhanced gene expression is not computed directly, and is instead imputed using a regression algorithm. For each gene, a model using the PC vectors of each spot is trained to predict the spot-level gene expression, and the fitted model is used to predict subspot expression from the subspot PCs.
Gene expression enhancement is implemented in the enhanceFeatures() function. BayesSpace predicts expression with xgboost by default, but linear and Dirichlet regression are also available via the model argument. When using xgboost, we suggest automatically tuning the nrounds parameter by setting it to 0, although this comes at the cost of increased runtime (~4x slower than a pre-specified nrounds in practice).
enhanceFeatures() can be used to impute subspot-level expression for all genes, or for a subset of genes of interest. Here, we’ll demonstrate by enhancing the expression of four marker genes: PMEL (ENSG00000185664, melanoma), CD2 (ENSG00000116824, T-cells), CD19 (ENSG00000177455, B-cells), and COL1A1 (ENSG00000108821, fibroblasts).
markers <- c("ENSG00000185664", "ENSG00000116824", "ENSG00000177455", "ENSG00000108821")
melanoma.enhanced <- enhanceFeatures(melanoma.enhanced, melanoma,
feature_names=markers,
nrounds=0)By default, log-normalized expression (logcounts(sce)) is imputed, although other assays or arbitrary feature matrices can be specified.
logcounts(melanoma.enhanced)[markers, 1:5]
#> subspot_1.1 subspot_2.1 subspot_3.1 subspot_4.1 subspot_5.1
#> ENSG00000185664 2.6628363 1.9758757 2.5970280 2.5534587 2.6945219
#> ENSG00000116824 0.4270058 0.3931560 0.3871010 0.3545035 0.4673422
#> ENSG00000177455 0.5908179 0.3442686 0.4279284 0.4189567 0.4279284
#> ENSG00000108821 0.0000000 1.0713838 1.4132689 1.0773389 0.9019099Diagnostic measures from each predictive model, such as rmse when using xgboost, are added to the rowData of the enhanced dataset.
rowData(melanoma.enhanced)[markers, ]
#> DataFrame with 4 rows and 4 columns
#> gene_id gene_name is.HVG enhanceFeatures.rmse
#> <character> <character> <logical> <numeric>
#> ENSG00000185664 ENSG00000185664 PMEL TRUE 0.752284
#> ENSG00000116824 ENSG00000116824 CD2 TRUE 0.655391
#> ENSG00000177455 ENSG00000177455 CD19 TRUE 0.711007
#> ENSG00000108821 ENSG00000108821 COL1A1 TRUE 0.548235Spatial gene expression is visualized with featurePlot().
Here, we compare the spatial expression of the imputed marker genes.
enhanced.plots <- purrr::map(markers, function(x) featurePlot(melanoma.enhanced, x))
patchwork::wrap_plots(enhanced.plots, ncol=2)And we can compare to the spot-level expression.
spot.plots <- purrr::map(markers, function(x) featurePlot(melanoma, x))
patchwork::wrap_plots(c(enhanced.plots, spot.plots), ncol=4)If save.chain is set to TRUE in either spatialCluster() or spatialEnhance(), the chain associated with the respective MCMC run is preserved to disk as an HDF5 file. The path to this file is stored in the SingleCellExperiment’s metadata at metadata(sce)$h5.chain, and can be read directly using mcmcChain().
The chain is provided as a coda::mcmc object, which can be analyzed with TidyBayes or as a matrix. The object has one row per iteration, with the values of the parameters concatenated across the row. Columns are named with the parameter name and index (if any).
chain <- mcmcChain(melanoma)
chain[1:5, 1:5]
#> lambda[1,1] lambda[1,2] lambda[1,3] lambda[1,4] lambda[1,5]
#> [1,] 0.0100000 0.00000000 0.00000000 0.000000e+00 0.00000000
#> [2,] 0.1836711 0.01019059 0.08664012 3.994578e-02 -0.02443957
#> [3,] 0.1790958 -0.04576098 0.07102969 8.286855e-05 -0.10919746
#> [4,] 0.1730928 -0.02089801 0.08429716 2.803978e-02 -0.10322589
#> [5,] 0.1923552 -0.02112783 0.07563260 2.088451e-02 -0.04688843To remove the HDF5 file from disk and remove its path from the metadata, use removeChain().