Contents

1 Introduction

“APL” is a package developed for computation of Association Plots, a method for visualization and analysis of single cell transcriptomics data. The main focus of “APL” is the identification of genes characteristic for individual clusters of cells from input data.

When working with APL package please cite:

Gralinska, E., Kohl, C., Fadakar, B. S., & Vingron, M. (2022). 
Visualizing Cluster-specific Genes from Single-cell Transcriptomics Data Using Association Plots. 
Journal of Molecular Biology, 434(11), 167525.

A citation can also be obtained in R by running citation("APL"). For a mathematical description of the method, please refer to the manuscript.

2 Installation

To install the APL from Bioconductor, run:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("APL")

Alternatively the package can also be installed from GitHub:

library(devtools)
install_github("VingronLab/APL")

To additionally build the package vignette, run instead

install_github("VingronLab/APL", build_vignettes = TRUE, dependencies = TRUE)

Building the vignette will however take considerable time.

2.1 pytorch installation

In order to decrease the computation time of the singular value decomposition (SVD), we highly recommend the installation of pytorch. More information on the pytorch installation is given below. Instead of installing pytorch, users can also opt to use the R native SVD. For this, please use the argument python = FALSE wherever applicable in this vignette.

2.1.1 Install pytorch with reticulate

library(reticulate)
install_miniconda()
use_condaenv(condaenv = file.path(miniconda_path(),"envs/r-reticulate"),
             required=TRUE)
      
conda_install(envname = "r-reticulate", packages = "numpy")
conda_install(envname = "r-reticulate", packages = "pytorch")

2.1.2 Manually install pytorch with conda

To install pytorch please download the appropriate Miniconda installer for your system from the conda website. Follow the installation instructions on their website and make sure the R package reticulate is also installed before proceeding. Once installed, list all available conda environments via
conda info --envs
One of the environments should have r-reticulate in its name. Depending on where you installed it and your system, the exact path might be different. Activate the environment and install pytorch into it.

conda activate ~/.local/share/r-miniconda/envs/r-reticulate # change path accordingly.
conda install numpy
conda install pytorch

3 Preprocessing

3.1 Setup

In this vignette we will use a small data set published by Darmanis et al. (2015) consisting of 466 human adult cortical single cells sequenced on the Fluidigm platform as an example. To obtain the data necessary to follow the vignette we use the Bioconductor package scRNAseq.

Besides the package APL we will use Bioconductor packages to preprocess the data. Namely we will use SingleCellExperiment, scater and scran. However, the preprocessing could equally be performed with the single-cell RNA-seq analysis suite Seurat.

The preprocessing steps are performed according to the recommendations published in Orchestrating Single-Cell Analysis with Bioconductor by Amezquita et al. (2022). For more information about the rational behind them please refer to the book.

library(APL)
library(scRNAseq)
library(SingleCellExperiment)
library(scran)
library(scater)
set.seed(1234)

3.2 Loading the data

We start with the loading and preprocessing of the Darmanis data.

darmanis <- DarmanisBrainData()
darmanis
#> class: SingleCellExperiment 
#> dim: 22085 466 
#> metadata(0):
#> assays(1): counts
#> rownames(22085): 1/2-SBSRNA4 A1BG ... ZZZ3 tAKR
#> rowData names(0):
#> colnames(466): GSM1657871 GSM1657872 ... GSM1658365 GSM1658366
#> colData names(6): metrics age ... experiment_sample_name tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

3.3 Normalization, PCA & Clustering

Association Plots from APL should be computed based on the normalized expression data. Therefore, we first normalize the counts from the Darmanis data and calculate both PCA and UMAP for visualizations later.

For now, APL requires the data to be clustered beforehand. The darmanis data comes already annotated, so we will use the cell types stored in the cell.type metadata column instead of performing a clustering.


set.seed(100)
clust <- quickCluster(darmanis) 
darmanis <- computeSumFactors(darmanis, cluster=clust, min.mean=0.1)
darmanis <- logNormCounts(darmanis)

dec <- modelGeneVar(darmanis)
top_darmanis <- getTopHVGs(dec, n=5000)
darmanis <- fixedPCA(darmanis, subset.row=top_darmanis) 
darmanis <- runUMAP(darmanis, dimred="PCA")
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by 'spam'
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by 'spam'

plotReducedDim(darmanis, dimred="UMAP", colour_by="cell.type")

4 Quick start

The fastest way to compute the Association Plot for a selected cluster of cells from the input data is by using a wrapper function runAPL(). runAPL() automates most of the analysis steps for ease of use.

For example, to generate an Association Plot for the oligodendrocytes we can use the following command:

runAPL(darmanis,
       assay = "logcounts",
       top = 5000,
       group = which(darmanis$cell.type == "oligodendrocytes"),
       type = "ggplot",
       python = TRUE)
#> Warning in rm_zeros(obj): Matrix contains rows with only 0s. These rows were
#> removed. If undesired set rm_zeros = FALSE.
#> 
#>  Using 74 dimensions. Subsetting.