Contents

1 Introduction

DecontPro assess and decontaminate single-cell protein expression data, such as those generated from CITE-seq or Total-seq. The count matrix is decomposed into three matrices, the native, the ambient and the background that represent the contribution from the true protein expression on cells, the ambient material and other non-specific background contamination.

2 Installation

DecontX Package can be installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("decontX")

Then the package can be loaded in R using the following command:

library(decontX)

To see the latest updates and releases or to post a bug, see our GitHub page at https://github.com/campbio/decontX.

3 Importing data

Here we use an example dataset from SingleCellMultiModal package.

library(SingleCellMultiModal)
dat <- CITEseq("cord_blood", dry.run = FALSE)
#> Warning: 'ExperimentList' contains 'data.frame' or 'DataFrame',
#>   potential for errors with mixed data types
counts <- experiments(dat)$scADT

For this tutorial, we sample only 1000 droplets from the dataset to demonstrate the use of functions. When analyzing your dataset, sub-sampling should be done with caution, as decontPro approximates contamination profile using the dataset. A biased sampling may introduce bias to the contamination profile approximation.

set.seed(42)
sample_id <- sample(dim(counts)[2], 1000, replace = FALSE)
counts_sample <- counts[, sample_id]

4 Generate cell clusters

decontPro requires a vector indicating the cell types of each droplet. Here we use Seurat for clustering.

library(Seurat)
library(dplyr)
adt_seurat <- CreateSeuratObject(counts_sample, assay = "ADT")
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
adt_seurat <- NormalizeData(adt_seurat, normalization.method = "CLR", margin = 2) %>%
  ScaleData(assay = "ADT") %>%
  RunPCA(assay = "ADT", features = rownames(adt_seurat), npcs = 10,
  reduction.name = "pca_adt") %>%
  FindNeighbors(dims = 1:10, assay = "ADT", reduction = "pca_adt") %>%
  FindClusters(resolution = 0.5)
#> Warning in irlba(A = t(x = object), nv = npcs, ...): You're computing too large
#> a percentage of total singular values, use a standard svd instead.
#> Warning: Requested number is larger than the number of available items (13).
#> Setting to 13.
#> Warning: Requested number is larger than the number of available items (13).
#> Setting to 13.
#> Warning: Requested number is larger than the number of available items (13).
#> Setting to 13.
#> Warning: Requested number is larger than the number of available items (13).
#> Setting to 13.
#> Warning: Requested number is larger than the number of available items (13).
#> Setting to 13.
#> Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
#> 
#> Number of nodes: 1000
#> Number of edges: 32498
#> 
#> Running Louvain algorithm...
#> Maximum modularity in 10 random starts: 0.8567
#> Number of communities: 9
#> Elapsed time: 0 seconds
adt_seurat <- RunUMAP(adt_seurat,
                      dims = 1:10,
                      assay = "ADT",
                      reduction = "pca_adt",
                      reduction.name = "adtUMAP",
                      verbose = FALSE)
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
DimPlot(adt_seurat, reduction = "adtUMAP", label = TRUE)

FeaturePlot(adt_seurat, 
            features = c("CD3", "CD4", "CD8", "CD19", "CD14", "CD16", "CD56"))