This package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots (AP). Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest.
APL 1.9.1
“APL” is a package developed for computation of Association Plots, a method for visualization and analysis of single cell transcriptomics data. The main focus of “APL” is the identification of genes characteristic for individual clusters of cells from input data.
When working with APL package please cite:
Gralinska, E., Kohl, C., Fadakar, B. S., & Vingron, M. (2022).
Visualizing Cluster-specific Genes from Single-cell Transcriptomics Data Using Association Plots.
Journal of Molecular Biology, 434(11), 167525.
A citation can also be obtained in R by running citation("APL")
.
For a mathematical description of the method, please refer to the manuscript.
To install the APL from Bioconductor, run:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("APL")
Alternatively the package can also be installed from GitHub:
library(devtools)
install_github("VingronLab/APL")
To additionally build the package vignette, run instead
install_github("VingronLab/APL", build_vignettes = TRUE, dependencies = TRUE)
Building the vignette will however take considerable time.
In order to decrease the computation time of the singular value decomposition (SVD), we highly recommend the installation of pytorch
.
More information on the pytorch
installation is given below.
Instead of installing pytorch
, users can also opt to use the R native SVD.
For this, please use the argument python = FALSE
wherever applicable in this vignette.
library(reticulate)
install_miniconda()
use_condaenv(condaenv = file.path(miniconda_path(),"envs/r-reticulate"),
required=TRUE)
conda_install(envname = "r-reticulate", packages = "numpy")
conda_install(envname = "r-reticulate", packages = "pytorch")
To install pytorch
please download the appropriate Miniconda installer for your system from the conda website.
Follow the installation instructions on their website and make sure the R package reticulate
is also installed before proceeding.
Once installed, list all available conda environments via
conda info --envs
One of the environments should have r-reticulate
in its name.
Depending on where you installed it and your system, the exact path might be different.
Activate the environment and install pytorch
into it.
conda activate ~/.local/share/r-miniconda/envs/r-reticulate # change path accordingly.
conda install numpy
conda install pytorch
In this vignette we will use a small data set published by Darmanis et al. (2015) consisting of 466 human adult cortical single cells sequenced on the Fluidigm platform as an example. To obtain the data necessary to follow the vignette we use the Bioconductor package scRNAseq.
Besides the package APL we will use Bioconductor packages to preprocess the data. Namely we will use SingleCellExperiment, scater and scran. However, the preprocessing could equally be performed with the single-cell RNA-seq analysis suite Seurat.
The preprocessing steps are performed according to the recommendations published in Orchestrating Single-Cell Analysis with Bioconductor by Amezquita et al. (2022). For more information about the rational behind them please refer to the book.
library(reticulate)
use_condaenv(condaenv = file.path(miniconda_path(),"envs/r-reticulate"),
required=TRUE)
library(APL)
library(scRNAseq)
library(SingleCellExperiment)
library(scran)
library(scater)
set.seed(1234)
We start with the loading and preprocessing of the Darmanis data.
darmanis <- DarmanisBrainData()
darmanis
#> class: SingleCellExperiment
#> dim: 22085 466
#> metadata(0):
#> assays(1): counts
#> rownames(22085): 1/2-SBSRNA4 A1BG ... ZZZ3 tAKR
#> rowData names(0):
#> colnames(466): GSM1657871 GSM1657872 ... GSM1658365 GSM1658366
#> colData names(6): metrics age ... experiment_sample_name tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Association Plots from APL should be computed based on the normalized expression data. Therefore, we first normalize the counts from the Darmanis data and calculate both PCA and UMAP for visualizations later.
For now, APL requires the data to be clustered beforehand. The darmanis data comes already annotated, so we will use the cell types stored in the cell.type
metadata column instead of performing a clustering.
set.seed(100)
clust <- quickCluster(darmanis)
darmanis <- computeSumFactors(darmanis, cluster=clust, min.mean=0.1)
darmanis <- logNormCounts(darmanis)
dec <- modelGeneVar(darmanis)
top_darmanis <- getTopHVGs(dec, n=5000)
darmanis <- fixedPCA(darmanis, subset.row=top_darmanis)
darmanis <- runUMAP(darmanis, dimred="PCA")
plotReducedDim(darmanis, dimred="UMAP", colour_by="cell.type")
The fastest way to compute the Association Plot for a selected cluster of cells from the input data is by using a wrapper function runAPL()
.
runAPL()
automates most of the analysis steps for ease of use.
For example, to generate an Association Plot for the oligodendrocytes we can use the following command:
runAPL(darmanis,
assay = "logcounts",
top = 5000,
group = which(darmanis$cell.type == "oligodendrocytes"),
type = "ggplot",
python = TRUE)
#> Warning in rm_zeros(obj): Matrix contains rows with only 0s. These rows were
#> removed. If undesired set rm_zeros = FALSE.
#> Warning in rm_zeros(mat): Matrix contains rows with only 0s. These rows were
#> removed. If undesired set rm_zeros = FALSE.
#>
#> Using 72 dimensions. Subsetting.