Contents

1 Install and Load the Package

To install cytofkit package, start R and run the following codes on R console:

source("https://bioconductor.org/biocLite.R")
biocLite("cytofkit")

Notes: cytofkit GUI is dependent on XQuartz windowing system (X Windows) on Mac (OS X > 10.7). Install XQuartz from http://xquartz.macosforge.org.

Load the Package:

library("cytofkit") 

Read the package description:

?"cytofkit-package"

2 Options for Using cytofkit Package

cytofkit provides three ways to employ the workforce of this package:

2.1 Run with GUI

The easiest way to use cytofkit package is through the GUI. The GUI provides all main options of cytofkit on a visual interface. To launch the GUI, load the package and type the following command:

cytofkit_GUI()  

The interface will appear like below, you can click the information button ! to check the explanation for each entry and customize your own analysis.

cytofkit GUI

cytofkit GUI

Start your analysis as simply as following:

  • Choose the input fcs files from the directory where you store FCS data;
  • Select the markers from the auto-generated marker list;
  • Choose the directory where to save your output;
  • Give a project name as a prefix for the names of result files;
  • Select a data merging method if you have multiple FCS files;
  • Select your clustering method(s), visualization method(s), progression estimation method(s)

Then submit it, that’s all.

Depends on the size of your data, it will take some time to run the analysis. Once done, a window will pop up, showing you the path where the results have been stored, and asking you if open the shiny web APP. If YES, the shiny APP will be deployed locally and opened in your default web browser. Among the saved results, a special R data object with suffix of .RData is for loading the results into the shiny APP. Choose the .RData file on the shiny APP then submit it, your journey of exploring the results starts.

2.2 Run with the Core Function

Cytofkit provides a core function cytofkit() to drive the analysis pipeline of mass cytometry data. Users only need to define several key parameters to start their analysis automatically. One simple example of running cytofkit using the core function is like this:

set.seed(100)
dir <- system.file('extdata',package='cytofkit')
file <- list.files(dir ,pattern='.fcs$', full=TRUE)
parameters <- list.files(dir, pattern='.txt$', full=TRUE)
res <- cytofkit(fcsFiles = file, 
                markers = parameters, 
                projectName = 'cytofkit_test',
                transformMethod = "cytofAsinh", 
                mergeMethod = "ceil",
                fixedNum = 500,                                    ## set at 500 for faster run
                dimReductionMethod = "tsne",
                clusterMethods = c("Rphenograph", "ClusterX"),    ## accept multiple methods
                visualizationMethods = c("tsne", "pca"),          ## accept multiple methods
                progressionMethod = "isomap",
                clusterSampleSize = 500,
                resultDir = getwd(),
                saveResults = TRUE, 
                saveObject = TRUE)

You can customize the parameters for your own need, run ?cytofkit to get information of all the parameters for cytofkit. As running with GUI, once the analysis is done, the results will be saved under resultDir automatically.

2.3 Run with Commands (Step-by-Step)

You can make use of the functions exported from cytofkit to make your analysis more flexible and fit your own need. Here we use a sample data for demo:

2.3.1 Pre-processing

## Loading the FCS data:  
dir <- system.file('extdata',package='cytofkit')
file <- list.files(dir ,pattern='.fcs$', full=TRUE)
paraFile <- list.files(dir, pattern='.txt$', full=TRUE)
parameters <- as.character(read.table(paraFile, header = TRUE)[,1])

## File name
file
## [1] "/tmp/RtmpGmHEQA/Rinst31c8717cdcd9/cytofkit/extdata/130515_C2_stim_CD19-.fcs"
## parameters
parameters
##  [1] "(Sm152)Di<Vd2>"    "(Eu153)Di<CD107a>" "(Sm154)Di<CD3>"   
##  [4] "(Gd155)Di<CD152>"  "(Gd156)Di<CD19>"   "(Gd157)Di<TIM3>"  
##  [7] "(Gd158)Di<CD56>"   "(Tb159)Di<IL10>"   "(Gd160)Di<CD28>"  
## [10] "(Dy161)Di<CD38>"   "(Dy162)Di<IL4>"    "(Dy163)Di<CD127>"
## Extract the expression matrix with transformation
data_transformed <- cytof_exprsExtract(fcsFile = file, 
                                       comp = FALSE, 
                                       transformMethod = "cytofAsinh")
## If analysing flow cytometry data, you can set comp to TRUE or 
## provide a transformation matrix to apply compensation

## If you have multiple FCS files, expression can be extracted and combined
combined_data_transformed <- cytof_exprsMerge(fcsFiles = file, comp=FALSE,
                                              transformMethod = "cytofAsinh",
                                              mergeMethod = "all")
## change mergeMethod to apply different combination strategy

## Take a look at the extracted expression matrix
head(data_transformed[ ,1:3])
##                        Cell_length<NA> (Rh103)Di<BC103> (Pd104)Di<BC104>
## 130515_C2_stim_CD19-_1        2.824903    -0.0018738715         4.072829
## 130515_C2_stim_CD19-_2        2.725596     0.0011091183         4.125760
## 130515_C2_stim_CD19-_3        2.672016    -0.0004428311         3.705151
## 130515_C2_stim_CD19-_4        2.555494    -0.0008412519         3.393448
## 130515_C2_stim_CD19-_5        2.644121     0.0040425242         3.554248
## 130515_C2_stim_CD19-_6        2.800979     0.0032699275         3.974557

2.3.2 Cell Subset Detection

## use clustering algorithm to detect cell subsets
## to speed up our test here, we only use 100 cells
data_transformed_1k <- data_transformed[1:100, ]

## run PhenoGraph
cluster_PhenoGraph <- cytof_cluster(xdata = data_transformed_1k, method = "Rphenograph")
##   Running PhenoGraph...  Finding nearest neighbors...DONE ~ 0.002 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 0.033 s
##   Build undirected graph from the weighted links...DONE ~ 0.015 s
##   Run louvain clustering on the graph ...DONE ~ 0.006 s
##   Return a community class
##   -Modularity value: 0.4696663 
##   -Number of clusters: 4 DONE!
## run ClusterX
data_transformed_1k_tsne <- cytof_dimReduction(data=data_transformed_1k, method = "tsne")
##   Running t-SNE...with seed 42  DONE
cluster_ClusterX <- cytof_cluster(ydata = data_transformed_1k_tsne,  method="ClusterX")
##   Running ClusterX...    Calculate cutoff distance...0.52  
##     Calculate local Density...DONE!
##     Detect nearest neighbour with higher density...DONE!
##     Peak detection...DONE!
##     Cluster assigning...DONE!
##  DONE!
## run DensVM (takes long time, we skip here)
cluster_DensVM <- cytof_cluster(xdata = data_transformed_1k, 
                                ydata = data_transformed_1k_tsne, method = "DensVM")
## run FlowSOM with cluster number 15
cluster_FlowSOM <- cytof_cluster(xdata = data_transformed_1k, method = "FlowSOM", FlowSOM_k = 12)
##   Running FlowSOM...    Building SOM...
##     Meta clustering to 12 clusters...
##  DONE!
## combine data
data_1k_all <- cbind(data_transformed_1k, data_transformed_1k_tsne, 
                     PhenoGraph = cluster_PhenoGraph, ClusterX=cluster_ClusterX, 
                     FlowSOM=cluster_FlowSOM)
data_1k_all <- as.data.frame(data_1k_all)

2.3.3 Cell Subset Visualization and Interpretation

## PhenoGraph plot on tsne
cytof_clusterPlot(data=data_1k_all, xlab="tsne_1", ylab="tsne_2", 
                  cluster="PhenoGraph", sampleLabel = FALSE)

## PhenoGraph cluster heatmap
PhenoGraph_cluster_median <- aggregate(. ~ PhenoGraph, data = data_1k_all, median)
cytof_heatmap(PhenoGraph_cluster_median[, 2:37], baseName = "PhenoGraph Cluster Median")

## ClusterX plot on tsne
cytof_clusterPlot(data=data_1k_all, xlab="tsne_1", ylab="tsne_2", cluster="ClusterX", sampleLabel = FALSE)

## ClusterX cluster heatmap
ClusterX_cluster_median <- aggregate(. ~ ClusterX, data = data_1k_all, median)
cytof_heatmap(ClusterX_cluster_median[, 2:37], baseName = "ClusterX Cluster Median")