## Loading required package: knitr

Introduction to BASiCS

Package: BASiCS
Authors: Catalina Vallejos (cnvallej@uc.cl) and Nils Eling (eling@ebi.ac.uk)
Compilation date: 2017-10-30


Introduction

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels within seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the group of cells under study.

BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model that propagates statistical uncertainty by simultaneously performing data normalisation (global scaling), technical noise quantification and two types of supervised downstream analyses:

In both cases, a probabilistic output is provided, with posterior probability thresholds calibrated through the expected false discovery rate (EFDR) [4].

Currently, BASiCS relies on the use of spike-in genes — that are artificially introduced to each cell's lysate — to perform these analyses.

A brief description for the statistical model implemented in BASiCS is provided in the “Methodology” section of this document.

Important: BASiCS has been designed in the context of supervised experiments where the groups of cells (e.g. experimental conditions, cell types) under study are known a priori (e.g. case-control studies). Therefore, we DO NOT advise the use of BASiCS in unsupervised settings where the aim is to uncover sub-populations of cells through clustering.


Quick start

The input dataset

The input dataset for BASiCS must be stored as an SingleCellExperiment object (see SingleCellExperiment package).

The newBASiCS_Data function can be used to create the input data object based on the following information:

For example, the following code simulates a dataset with 50 genes (40 biological and 10 spike-in) and 40 cells.

set.seed(1)
Counts = Counts = matrix(rpois(50*40, 2), ncol = 40)
rownames(Counts) <- c(paste0("Gene", 1:40), paste0("Spike", 1:10))
Tech = c(rep(FALSE,40),rep(TRUE,10))
set.seed(2)
SpikeInput = rgamma(10,1,1)
SpikeInfo <- data.frame("SpikeID" = paste0("Spike", 1:10), 
                        "SpikeInput" = SpikeInput)

# No batch structure
DataExample = newBASiCS_Data(Counts, Tech, SpikeInfo)

# With batch structure
DataExample = newBASiCS_Data(Counts, Tech, SpikeInfo, 
                             BatchInfo = rep(c(1,2), each = 20)) 

Note: scRNA-seq datasets typically require quality control filtering before performing the analysis. This is in order to remove cells and/or transcripts with very low expression counts. The function BASiCS_Filter can be used to perform this task. For examples, refer to help(BASiCS_Filter).

Note: the scater package provides enhanced functionality for the pre-processing of scRNA-seq datasets.

To convert an existing SingleCellExperiment object (Data) into one that can be used within BASiCS, meta-information must be stored in the object.

Running the MCMC sampler

Parameter estimation is performed using the BASiCS_MCMC function. Essential parameters for running this algorithm are:

If the optional parameter PrintProgress is set to TRUE, the R console will display the progress of the MCMC algorithm. For other optional parameters refer to help(BASiCS_MCMC).

Here, we illustrate the usage of BASiCS_MCMC using a built-in synthetic dataset.

Data <- makeExampleBASiCS_Data()
Chain <- BASiCS_MCMC(Data = Data, N = 1000, Thin = 10, Burn = 500, 
                     PrintProgress = FALSE)
## -------------------------------------------------------------
## MCMC sampler has been started: 1000 iterations to go.
## -------------------------------------------------------------
## -------------------------------------------------------------
## End of Burn-in period.
## -------------------------------------------------------------
##  
## -------------------------------------------------------------
## -------------------------------------------------------------
## All 1000 MCMC iterations have been completed.
## -------------------------------------------------------------
## -------------------------------------------------------------
##  
## -------------------------------------------------------------
## Please see below a summary of the overall acceptance rates.
## -------------------------------------------------------------
##  
## Minimum acceptance rate among mu[i]'s: 0.536
## Average acceptance rate among mu[i]'s: 0.70456
## Maximum acceptance rate among mu[i]'s: 0.858
##  
## Minimum acceptance rate among delta[i]'s: 0.5
## Average acceptance rate among delta[i]'s: 0.58328
## Maximum acceptance rate among delta[i]'s: 0.682
##  
## Acceptance rate for phi (joint): 0.916
##  
## Minimum acceptance rate among nu[j]'s: 0.438
## Average acceptance rate among nu[j]'s: 0.5387
## Maximum acceptance rate among nu[j]'s: 0.73
##  
## Minimum acceptance rate among theta[k]'s: 0.754
## Average acceptance rate among theta[k]'s: 0.754
## Maximum acceptance rate among theta[k]'s: 0.754
##  
## -------------------------------------------------------------
## 

Important remarks:

Typically, setting N=20000, Thin=20 and Burn=10000 leads to stable results.

Analysis for a single group of cells

We illustrate this analysis using a small extract from the MCMC chain obtained in [2] when analysing the single cell samples provided in [5]. This is included within BASiCS as the ChainSC dataset.

data(ChainSC)

The following code is used to identify highly variable genes (HVG) and lowly variable genes (LVG) within these cells. The VarThreshold parameter sets a lower threshold for the proportion of variability that is assigned to the biological component (Sigma). In the examples below:

For each gene, these functions return posterior probabilities as a measure of HVG/LVG evidence. A cut-off value for these posterior probabilities is set by controlling EFDR (defaul option: EviThreshold defined such that EFDR = 0.10).

par(mfrow = c(2,2))
HVG <- BASiCS_DetectHVG(ChainSC, VarThreshold = 0.6, Plot = TRUE)
LVG <- BASiCS_DetectLVG(ChainSC, VarThreshold = 0.2, Plot = TRUE)

plot of chunk quick-start-HVGdetection

To access the results of these tests, please use.

head(HVG$Table)
##     GeneIndex GeneName        Mu    Delta     Sigma Prob  HVG
## 21         21  Map3k11 10.044337 2.344024 0.7599326 1.00 TRUE
## 71         71   Lefty1  3.574049 2.972835 0.7657068 0.96 TRUE
## 48         48     Ctgf  4.296390 2.455164 0.7370870 0.91 TRUE
## 88         88    Naa11  4.997926 2.503697 0.7438627 0.91 TRUE
## 166       166    Pnma2  2.773063 2.781834 0.7270004 0.89 TRUE
## 185       185     Alg8  3.931154 2.489690 0.7379904 0.88 TRUE
head(LVG$Table)
##    GeneIndex GeneName        Mu      Delta      Sigma Prob  LVG
## 16        16  Gm10653 1165.3430 0.04023528 0.05845448    1 TRUE
## 63        63   Luc7l2 1942.1656 0.07219708 0.10128552    1 TRUE
## 65        65   Atp5g2  338.0637 0.06562415 0.09368431    1 TRUE
## 89        89    Rpl14 1348.3208 0.02466278 0.03720835    1 TRUE
## 90        90    Rpl11 1256.5145 0.02305986 0.03439021    1 TRUE
## 92        92     Rcc2  294.2660 0.06411164 0.09191962    1 TRUE
SummarySC <- Summary(ChainSC)
plot(SummarySC, Param = "mu", Param2 = "delta", log = "xy")
with(HVG$Table[HVG$Table$HVG == TRUE,], points(Mu, Delta))
with(LVG$Table[LVG$Table$LVG == TRUE,], points(Mu, Delta))

plot of chunk quick-start-HVGdetectionPlot

Note: this criteria for threshold has changed with respect to the original release of BASiCS (where EviThreshold was defined such that EFDR = EFNR). However, the new choice is more stable (sometimes, it was not posible to find a threshold such that EFDR = EFNR).

Analysis for two groups of cells

To illustrate the use of the differential mean expression and differential over-dispersion tests between two cell populations, we use extracts from the MCMC chains obtained in [2] when analysing the [5] dataset (single cells vs pool-and-split samples). These were obtained by independently running the BASiCS_MCMC function for each group of cells.

data(ChainSC)
data(ChainRNA)
Test <- BASiCS_TestDE(Chain1 = ChainSC, Chain2 = ChainRNA,
                      GroupLabel1 = "SC", GroupLabel2 = "PaS",
                      EpsilonM = log2(1.5), EpsilonD = log2(1.5),
                      EFDR_M = 0.10, EFDR_D = 0.10,
                      Offset = TRUE, OffsetPlot = TRUE, Plot = TRUE)

plot of chunk quick-start-TestDEplot of chunk quick-start-TestDEplot of chunk quick-start-TestDEplot of chunk quick-start-TestDEplot of chunk quick-start-TestDEplot of chunk quick-start-TestDE

In BASiCS_TestDE, EpsilonM sets the log2 fold change (log2FC) in expression (\(\mu\)) and EpsilonD the log2FC in over-dispersion (\(\delta\)). As a default option: EpsilonM = EpsilonD = log2(1.5) (i.e. 50\% increase). To adjust for differences in overall RNA content, an internal offset correction is performed when OffSet=TRUE. This is the recommended default.

The resulting output list can be displayed using

head(Test$TableMean)
##      GeneName MeanOverall    Mean1   Mean2 MeanFC MeanLog2FC ProbDiffMean
## 47   BC018473    1502.198 2948.429  94.024 31.251      4.966            1
## 71     Lefty1      10.756    5.170  16.195  0.328     -1.607            1
## 329       Erh     488.756  354.929 619.062  0.573     -0.802            1
## 418    Zfp937     149.071  221.552  78.498  2.808      1.490            1
## 437   Snora33       6.428    1.943  10.795  0.176     -2.510            1
## 445 Zfp71-rs1     126.059   62.323 188.118  0.329     -1.605            1
##     ResultDiffMean
## 47             SC+
## 71            PaS+
## 329           PaS+
## 418            SC+
## 437           PaS+
## 445           PaS+
head(Test$TableDisp)
##      GeneName MeanOverall DispOverall Disp1 Disp2 DispFC DispLog2FC
## 49     Gm5643    1821.217       0.062 0.105 0.020  5.171      2.370
## 58     Luc7l2    2809.375       0.045 0.072 0.018  4.076      2.027
## 87  Hist2h2ab      70.985       0.346 0.591 0.107  5.626      2.492
## 94    Gm16287     658.777       0.081 0.141 0.022  6.669      2.737
## 102    Zyg11b    1657.344       0.063 0.108 0.019  5.772      2.529
## 117    Stxbp2     615.573       0.064 0.105 0.024  4.349      2.121
##     ProbDiffDisp ResultDiffDisp
## 49             1            SC+
## 58             1            SC+
## 87             1            SC+
## 94             1            SC+
## 102            1            SC+
## 117            1            SC+

Note: due to the confounding between mean and over-dispersion that is typically observed in scRNA-seq datasets, we only assess changes in over-dispersion for those genes in which the mean does not change between the groups. Use EpsilonM = 0 as a conservative option:

Test <- BASiCS_TestDE(Chain1 = ChainSC, Chain2 = ChainRNA,
                      GroupLabel1 = "SC", GroupLabel2 = "PaS",
                      EpsilonM = 0, EpsilonD = log2(1.5),
                      EFDR_M = 0.10, EFDR_D = 0.10,
                      Offset = TRUE, OffsetPlot = TRUE, Plot = TRUE)

plot of chunk quick-start-TestDE-2plot of chunk quick-start-TestDE-2plot of chunk quick-start-TestDE-2plot of chunk quick-start-TestDE-2plot of chunk quick-start-TestDE-2plot of chunk quick-start-TestDE-2

Additional details

Storing and loading MCMC chains

To externally store the output of BASiCS_MCMC (recommended), additional parameters StoreChains, StoreDir and RunName are required. For example:

Data <- makeExampleBASiCS_Data()
Chain <- BASiCS_MCMC(Data, N = 1000, Thin = 10, Burn = 500, 
                     PrintProgress = FALSE, StoreChains = TRUE, 
                     StoreDir = tempdir(), RunName = "Example")
## -------------------------------------------------------------
## MCMC sampler has been started: 1000 iterations to go.
## -------------------------------------------------------------
## -------------------------------------------------------------
## End of Burn-in period.
## -------------------------------------------------------------
##  
## -------------------------------------------------------------
## -------------------------------------------------------------
## All 1000 MCMC iterations have been completed.
## -------------------------------------------------------------
## -------------------------------------------------------------
##  
## -------------------------------------------------------------
## Please see below a summary of the overall acceptance rates.
## -------------------------------------------------------------
##  
## Minimum acceptance rate among mu[i]'s: 0.536
## Average acceptance rate among mu[i]'s: 0.70456
## Maximum acceptance rate among mu[i]'s: 0.858
##  
## Minimum acceptance rate among delta[i]'s: 0.5
## Average acceptance rate among delta[i]'s: 0.58328
## Maximum acceptance rate among delta[i]'s: 0.682
##  
## Acceptance rate for phi (joint): 0.916
##  
## Minimum acceptance rate among nu[j]'s: 0.438
## Average acceptance rate among nu[j]'s: 0.5387
## Maximum acceptance rate among nu[j]'s: 0.73
##  
## Minimum acceptance rate among theta[k]'s: 0.754
## Average acceptance rate among theta[k]'s: 0.754
## Maximum acceptance rate among theta[k]'s: 0.754
##  
## -------------------------------------------------------------
## 

In this example, the output of BASiCS_MCMC will be stored as a BASiCS_Chain object in the file “chain_Example.Rds”, within the tempdir() directory.

To load pre-computed MCMC chains,

Chain <- BASiCS_LoadChain("Example", StoreDir = tempdir()) 

Convergence assessment

To assess convergence of the chain, the convergence diagnostics provided by the package coda can be used. Additionally, the chains can be visually inspected. For example:

plot(Chain, Param = "mu", Gene = 1, log = "y")

plot of chunk Traceplots

plot(Chain, Param = "phi", Cell = 1)

plot of chunk Traceplots

In the figures above:

Summarising the posterior distribution

To access the MCMC chains associated to individual parameter use the function displayChainBASiCS. For example,

displayChainBASiCS(Chain, Param = "mu")[1:5,1:5]
##          Gene1    Gene2    Gene3    Gene4    Gene5
## [1,]  7.296594 6.467043 2.966891 4.170876 13.18930
## [2,] 11.918342 3.424790 3.519730 5.672648 18.75135
## [3,] 10.606028 2.300362 4.912133 4.694006 18.11018
## [4,]  8.682155 5.594955 4.602469 7.993613 20.58960
## [5,]  5.968966 3.493966 8.612633 6.250717 18.64799

As a summary of the posterior distribution, the function Summary calculates posterior medians and the High Posterior Density (HPD) intervals for each model parameter. As a default option, HPD intervals contain 0.95 probability.

ChainSummary <- Summary(Chain)

The function displaySummaryBASiCS extract posterior summaries for individual parameters. For example

head(displaySummaryBASiCS(ChainSummary, Param = "mu"))
##              Mu     lower     upper
## Gene1  7.803682  4.146507 13.373858
## Gene2  5.328882  2.088817 11.432661
## Gene3  4.187075  2.548418  9.328065
## Gene4  5.146918  2.670231 12.964300
## Gene5 18.268763 13.189304 28.372033
## Gene6  8.234371  5.224591 11.842912

The following figures display posterior medians and the corresponding HPD 95% intervals for gene-specific parameters \(\mu_i\) (mean) and \(\delta_i\) (over-dispersion)

par(mfrow = c(2,2))
plot(ChainSummary, Param = "mu", main = "All genes", log = "y")
plot(ChainSummary, Param = "mu", Genes = 1:10, main = "First 10 genes")
plot(ChainSummary, Param = "delta", main = "All genes")
plot(ChainSummary, Param = "delta", Genes = c(2,5,10,50), main = "5 customized genes")

plot of chunk OtherHPD

It is also possible to obtain similar summaries for the normalising constants \(\phi_j\) and \(s_j\).

par(mfrow = c(1,2))
plot(ChainSummary, Param = "phi")
plot(ChainSummary, Param = "s", Cells = 1:5)

plot of chunk Normalisation

Finally, it is also possible to create a scatterplot of posterior estimates for gene-specific parameters. Typically, this plot will exhibit the confounding effect that is observed between mean and over-dispersion.

par(mfrow = c(1,2))
plot(ChainSummary, Param = "mu", Param2 = "delta", log = "x", SmoothPlot = FALSE)
plot(ChainSummary, Param = "mu", Param2 = "delta", log = "x", SmoothPlot = TRUE)

plot of chunk DispVsExp

The option SmoothPlot = TRUE is generally recommended as this plot will contain thousands of genes when analysing real datasets.

Normalisation and removal of technical variation

It is also possible to produce a matrix of normalised and denoised expression counts for which the effect of technical variation is removed. For this purpose, we implemented the function BASiCS_DenoisedCounts. For each gene \(i\) and cell \(j\) this function returns

\[ x^*_{ij} = \frac{ x_{ij} } {\hat{\phi}_j \hat{\nu}_j}, \]

where \(x_{ij}\) is the observed expression count of gene \(i\) in cell \(j\), \(\hat{\phi}_j\) denotes the posterior median of \(\phi_j\) and \(\hat{\nu}_j\) is the posterior median of \(\nu_j\).

DenoisedCounts = BASiCS_DenoisedCounts(Data = Data, Chain = Chain)
DenoisedCounts[1:5, 1:5]
##           [,1]      [,2]     [,3]      [,4]     [,5]
## Gene1 0.000000 28.521971 39.13472  7.827306 39.67871
## Gene2 0.000000  0.000000  0.00000 15.654613  0.00000
## Gene3 0.000000  3.802929  0.00000  7.827306  0.00000
## Gene4 4.355742  1.901465 19.56736  0.000000 29.09772
## Gene5 4.355742  5.704394  0.00000 70.445756 10.58099

Alternativelly, the user can compute the normalised and denoised expression rates underlying the expression of all genes across cells using BASiCS_DenoisedRates. The output of this function is given by

\[ \Lambda_{ij} = \hat{\mu_i} \hat{\rho}_{ij}, \]

where \(\hat{\mu_i}\) represents the posterior median of \(\mu_j\) and \(\hat{\rho}_{ij}\) is given by its posterior mean (Monte Carlo estimate based on the MCMC sample of all model parameters).

DenoisedRates <- BASiCS_DenoisedRates(Data = Data, Chain = Chain, 
                                      Propensities = FALSE)
DenoisedRates[1:5, 1:5]
##          [,1]       [,2]      [,3]      [,4]      [,5]
## [1,] 2.565810 26.5407462 18.438498  8.042210 34.136011
## [2,] 1.482502  0.7891635  3.280819 11.893612  1.018034
## [3,] 2.252266  3.9125580  3.470748  5.287011  1.756470
## [4,] 4.592368  2.5639070  9.157448  2.524652 22.707970
## [5,] 8.144922  7.6225842 11.060046 51.076954 12.156195

Alternative, denoised expression propensities \(\hat{\rho}_{ij}\) can also be extracted

DenoisedProp = BASiCS_DenoisedRates(Data = Data, Chain = Chain, 
                                    Propensities = TRUE)
DenoisedProp[1:5, 1:5]
##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.3287947 3.4010541 2.3627946 1.0305660 4.3743466
## [2,] 0.2782012 0.1480917 0.6156674 2.2319149 0.1910409
## [3,] 0.5379092 0.9344371 0.8289196 1.2626979 0.4194980
## [4,] 0.8922560 0.4981442 1.7792101 0.4905173 4.4119551
## [5,] 0.4458387 0.4172469 0.6054075 2.7958627 0.6654088

Methodology

We first describe the model introduced in [1], which relates to a single group of cells.

Throughout, we consider the expression counts of \(q\) genes, where \(q_0\) are expressed in the population of cells under study (biological genes) and the remaining \(q-q_0\) are extrinsic spike-in (technical) genes. Let \(X_{ij}\) be a random variable representing the expression count of a gene \(i\) in cell \(j\) (\(i=1,\ldots,q\); \(j=1,\ldots,n\)). BASiCS is based on the following hierarchical model: \[X_{ij} \big| \mu_i, \phi_j, \nu_j, \rho_{ij} \sim \left\{ \begin{array}{ll} \mbox{Poisson}(\phi_j \nu_j \mu_i \rho_{ij}), \mbox{ for }i=1,\ldots,q_0, j=1,\ldots,n \ \mbox{Poisson}(\nu_j \mu_i), \mbox{ for }i=q_0+1,\ldots,q, j=1,\ldots,n, \end{array} \right.\]

where \(\nu_j\) and \(\rho_{ij}\) are mutually independent random effects such that \(\nu_j|s_j,\theta \sim \mbox{Gamma}(1/\theta,1/ (s_j \theta))\) and \(\rho_{ij} | \delta_i \sim \mbox{Gamma} (1/\delta_i,1/\delta_i)\)[footnoteGamma].

A graphical representation of this model is displayed below. This is based on the expression counts of 2 genes (\(i\): biological and \(i'\): technical) at 2 cells (\(j\) and \(j'\)). Squared and circular nodes denote known observed quantities (observed expression counts and added number of spike-in mRNA molecules) and unknown elements, respectively. Whereas black circular nodes represent the random effects that play an intermediate role in our hierarchical structure, red circular nodes relate to unknown model parameters in the top layer of hierarchy in our model. Blue, green and grey areas highlight elements that are shared within a biological gene, technical gene or cell, respectively.

\centerline{\includegraphics[height=4in]{./BASiCS_DAG.jpg}}

In this setting, the key parameters to be used for downstream analyses are:

Additional (nuisance) parameters are interpreted as follows:

When cells from the same group are processed in multiple sequencing batches, this model is extended so that the technical over-dispersion parameter \(\theta\) is batch-specific. This extension allows a different strenght of technical noise to be inferred for each batch of cells.

[footnoteGamma]: We parametrize the Gamma distribution such that if \(X \sim \mbox{Gamma}(a,b)\), then \(\mbox{E}(X)=a/b\) and \(\mbox{var}(X)=a/b^2\).

In [2], this model has been extended to cases where multiple groups of cells are under study. This is achieved by assuming gene-specific parameters to be also group-specific. Based on this setup, evidence of differential expression is quantified through log2-fold changes of gene-specific parameters (mean and over-dispersion) between the groups.

More details regarding the model setup, prior specification and implementation are described in [1] and [2].


Acknowledgements

We thank several members of the Marioni laboratory (EMBL-EBI; CRUK-CI) for support and discussions throughout the development of this R library. In particular, we are grateful to Aaron Lun (@LTLA, CRUK-CI) for advise and support during the preparation the Bioconductor submission.

We also acknowledge feedback and contributions from (Github aliases provided within parenthesis): Ben Dulken (@bdulken), Chang Xu (@xuchang116), Danilo Horta (@Horta), Dmitriy Zhukov (@dvzhukov), Jens Preußner (@jenzopr), Joanna Dreux (@Joannacodes), Kevin Rue-Albrecht (@kevinrue), Luke Zappia (@lazappi), Simon Anders (@s-andrews), Yongchao Ge and Yuan Cao (@yuancao90), among others.

This work has been funded by the MRC Biostatistics Unit (MRC grant no. MRC_MC_UP_0801/1; Catalina Vallejos and Sylvia Richardson), EMBL European Bioinformatics Institute (core European Molecular Biology Laboratory funding; Catalina Vallejos, Nils Eling and John Marioni), CRUK Cambridge Institute (core CRUK funding; John Marioni) and The Alan Turing Institute (EPSRC grant no. EP/N510129/1; Catalina Vallejos).


References

[1] Vallejos CA, Marioni JCM and Richardson S (2015) BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Computational Biology 11 (6), e1004333.

[2] Vallejos CA, Richardson S and Marioni JCM (2016) Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biology 17 (1), 1-14.

[3] Martinez-Jimenez CP, Eling N, Chen H, Vallejos CA, Kolodziejczyk AA, Connor F, Stojic L, Rayner TF, Stubbington MJT, Teichmann SA, de la Roche M, Marioni JC and Odom DT (2017) Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science 355 (6332), 1433-1436.

[4] Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 (2), 155-76.

[5] Grün D, Kester L, van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nature Methods 11 (6), 637-40.

[6] Vallejos CA, Risso D, Scialdone A, Dudoit S and Marioni JCM (2017) Normalizing single-cell RNA-sequencing data: challenges and opportunities. Nature Methods 14, 565-571.

[7] Roberts GO and Rosenthal JS (2009). Examples of adaptive MCMC. Journal of Computational and Graphical Statistics 18: 349-367.

Session information

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] BASiCS_1.0.0               SingleCellExperiment_1.0.0
##  [3] SummarizedExperiment_1.8.0 DelayedArray_0.4.0        
##  [5] matrixStats_0.52.2         Biobase_2.38.0            
##  [7] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0       
##  [9] IRanges_2.12.0             S4Vectors_0.16.0          
## [11] BiocGenerics_0.24.0        BiocStyle_2.6.0           
## [13] knitr_1.17                
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6            bit64_0.9-7            
##  [3] progress_1.1.2          rprojroot_1.2          
##  [5] dynamicTreeCut_1.63-1   tools_3.4.2            
##  [7] backports_1.1.1         DT_0.2                 
##  [9] R6_2.2.2                KernSmooth_2.23-15     
## [11] vipor_0.4.5             DBI_0.7                
## [13] lazyeval_0.2.1          colorspace_1.3-2       
## [15] gridExtra_2.3           prettyunits_1.0.2      
## [17] bit_1.1-12              compiler_3.4.2         
## [19] scales_0.5.0            stringr_1.2.0          
## [21] digest_0.6.12           rmarkdown_1.6          
## [23] XVector_0.18.0          scater_1.6.0           
## [25] pkgconfig_2.0.1         htmltools_0.3.6        
## [27] highr_0.6               limma_3.34.0           
## [29] htmlwidgets_0.9         rlang_0.1.2            
## [31] RSQLite_2.0             FNN_1.1                
## [33] shiny_1.0.5             bindr_0.1              
## [35] zoo_1.8-0               BiocParallel_1.12.0    
## [37] dplyr_0.7.4             RCurl_1.95-4.8         
## [39] magrittr_1.5            GenomeInfoDbData_0.99.1
## [41] Matrix_1.2-11           Rcpp_0.12.13           
## [43] ggbeeswarm_0.6.0        munsell_0.4.3          
## [45] viridis_0.4.0           stringi_1.1.5          
## [47] yaml_2.1.14             edgeR_3.20.0           
## [49] zlibbioc_1.24.0         rhdf5_2.22.0           
## [51] plyr_1.8.4              grid_3.4.2             
## [53] blob_1.1.0              shinydashboard_0.6.1   
## [55] crayon_1.3.4            lattice_0.20-35        
## [57] splines_3.4.2           locfit_1.5-9.1         
## [59] igraph_1.1.2            rjson_0.2.15           
## [61] reshape2_1.4.2          biomaRt_2.34.0         
## [63] XML_3.98-1.9            glue_1.2.0             
## [65] evaluate_0.10.1         scran_1.6.0            
## [67] data.table_1.10.4-3     httpuv_1.3.5           
## [69] testthat_1.0.2          gtable_0.2.0           
## [71] assertthat_0.2.0        ggplot2_2.2.1          
## [73] mime_0.5                xtable_1.8-2           
## [75] coda_0.19-1             viridisLite_0.2.0      
## [77] tibble_1.3.4            AnnotationDbi_1.40.0   
## [79] beeswarm_0.2.3          memoise_1.1.0          
## [81] tximport_1.6.0          bindrcpp_0.2           
## [83] statmod_1.4.30