Welcome to the OpenCyto workshop!

Data

The tbdata GatingSet is part of a data set of intracellular cytokine staining (ICS) samples from TB-infected and non-infected subjects, stimulated with TB-specific and non-specific peptides. The samples were assayed for Ag-specific response to stimulation. The data were analyzed in Lin et al. (2015), using results from manual gating.

The data structures and objects underlying OpenCyto and the BioConductor Flow Cytometry Infrastructure

  1. flowFrame is an R object that corresponds to a single FCS file.
  2. flowSet is a collection of flowFrames, all with the same markers and channels.
  3. GatingHierarchy is a flowFrame with its associated data transformation, compensation, and gating information defining cell populations.
  4. GatingSet is a collection of GatingHierarchy objects, or equivalently a flowSet, with all transformation, compensation, and gating information.

Gating information stored in a GatingSet

The GatingSet associates each sample with a hierarchy of gates that define cell popualtions. The gates can be one, two, three or high-dimensional. Any gates supported by the flowCore infrastructure can be attached to a a hierarchy of gates in a GatingSet.

The GatingSet lets us visualize FCM data analysis at the individual event-level (single-cells). Data are stored on disk using HDF5 and are read-in as needed. This has a very efficient memory footprint and allows us to analyze large data sets on modest hardware.

It provides a conveninent and unified way to interact with FCM data for performing hierarchical or non-hierarchical automated gating by managing cell population definitions and their relationships.

If your aren’t yet using this infrastructure, you should be.

OpenCyto

There are at least 50 different algorithms in the literature for performing automated gating of FCM data. 75% of those are available for BioConductor.

There is no “one best algorithm” for FCM gating. Different algorithms have different strengths (Aghaeepour2013a).

OpenCyto provides a mechansim to use multiple algorithms to identify different cell populations. Complex high-dimensional methods are rarely necessary, and the focus is on simple methods that work robustly rather than fancy approaches that work on one data set.

Gating Templates

The general idea is that for a given FCM assay utilizing a set staining panel generated by one lab, a general strategy to identify cell populations of interest should be applicable to any set of data produced by the lab.

OpenCyto formalizes this in the form of a gatingTemplate. This template defines cell populations in terms of their phenotypic markers, parent populations, and gating methods / algorithms used to define them. The template is in the form of a csv file that is easier to create than an R script. openCyto will take the template and a data set and produce data-driven gates for each cell population and sample.

Let’s get started!

Our goal here will be to gate the tbdata data set. We have multiple samples per subject. One sample is unstimulated, the other is stimulated with ESAT-6 peptide. We’ll focus on identifying antigen-specific CD4+ T cells.

To this end, we need to compare the stimulated and non-stimulated samples within each subject.

Cleaning up the data.

The data were imported from a manual gating analysis. We’ll begin by deleting the manual gates, as we’re not interested in them for this demo. We’ll also rename some phenotypic data columns so that we can use faceting more easily.

The general gating scheme we’ll apply here is root/Singlets/CD14-/nonDebris/Lymphocytes/Live/CD3/CD4/Functional markers.

#Delete existing gates
Rm("Singlets",tbdata)

Next, we’ll use the new API add_pop in openCyto to build our new automated gating strategy.

We’ll grab a subset of the data for testing.

tbdata_subset = subset(tbdata,`PID`%in%unique(pData(tbdata)$`PID`)[1:2])

Add a singlet population

We’ll build up our gating template incrementally.

template = add_pop(
  tbdata_subset, alias = "Singlets", pop = "Singlets", parent = "root", dims = "FSC-A,FSC-H",gating_method = "singletGate",gating_args =
  "wider_gate=TRUE,subsample_pct=0.1"
  )

Plotting it, we see it looks reasonable.

ggcyto(tbdata_subset,mapping = aes(x = `FSC-A`,y = `FSC-H`)) + geom_hex(bins =
                                                                          50) + geom_gate("Singlets") + xlim(c(0,2e5))

CD14- Cells

Next we add CD14- cells. Lets’ see what this dimension looks like and decide on a good gating method.

ggcyto(tbdata_subset,mapping = aes(x = "CD14"),subset = "Singlets") + geom_histogram(binwidth =
                                                                                       100) + xlim(c(-100,4096)) + facet_wrap( ~ PID,scales = "free")

Mindensity should be able to handle this easily.

mindensity method on CD14-

template = rbind(
  template,add_pop(
  tbdata_subset,alias = "CD14n", "CD14-", parent = "Singlets", dims = "CD14", gating_method =
  "mindensity",groupBy = "PID",collapseDataForGating = TRUE
  )
  )
ggcyto(tbdata_subset,mapping = aes(x = "CD14"),subset = "Singlets") + geom_histogram(binwidth =
                                                                                       100) + xlim(c(-100,4096)) + facet_wrap( ~ PID,scales = "free") + geom_gate("CD14n")

Next we gate out debris

template = template[1:2,]
  template = rbind(
  template,add_pop(
  tbdata_subset,alias = "nonDebris",pop = "nonDebris+",parent = "CD14n",dims = "FSC-A",gating_method =
  "boundary",collapseDataForGating = FALSE,gating_args = "min=40000,max=2.5e5"
  )
  )
  ggcyto(tbdata_subset,mapping = aes(x = "FSC-A",y = "SSC-A"),subset = "CD14n") + geom_hex(bins = 100) +
    geom_gate()

And then lymphocytes

  template = rbind(
    template, add_pop(
    tbdata_subset,alias = "Lymphocytes",pop = "Lymphocytes+",parent = "nonDebris",dims = "FSC-A,SSC-A",gating_method =
    "flowClust.2d",preprocessing_method = "prior_flowClust",preprocessing_args="K=3",collapseDataForGating = FALSE,gating_args =
    "quantile=0.95,target=c(80000,50000),K=3"
    )
    )
  ggcyto(tbdata_subset,mapping = aes(x = "FSC-A",y = "SSC-A"),subset = "nonDebris") +
    geom_hex(bins = 50) + geom_gate("Lymphocytes")

Next we gate live cells.

  template = rbind(
    template, add_pop(
    tbdata_subset,alias = "Live",pop = "Live+",parent = "Lymphocytes",dims = "<Am Cyan-A>",gating_method = "boundary",gating_args = "min=0,max=2000",collapseDataForGating = FALSE
    )
    )
    
    ggcyto(tbdata_subset,mapping = aes(x = "AViD",y="SSC-A"),subset = "Lymphocytes") +
    geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") +
    geom_gate("Live")

Then CD3+ cells

    template = rbind(
      template, add_pop(
      tbdata_subset,alias = "CD3",pop = "CD3+",parent = "Live",dims = "CD3",gating_method = "mindensity"
      )
      )
      ggcyto(tbdata_subset,mapping = aes(x = "CD3",y = "FSC-A"),subset = "Live") +
      geom_hex(bins=100) + ggcyto_par_set(limits = "instrument") + geom_gate("CD3")

And finally CD4+/CD8- cells

      template = rbind(
        template, add_pop(
        tbdata_subset,alias = "*",pop = "CD4+/-CD8+/-",dims = "CD4,CD8",gating_method =
        "mindensity2",gating_args="gate_range=c(500,4000)",parent = "CD3")
        )
        ggcyto(tbdata_subset,mapping = aes(x = "CD4",y = "CD8"),subset = "CD3") +
        geom_hex(bins=100) + ggcyto_par_set(limits = "data") + geom_gate()+xlim(0,3000)+ylim(0,4000)

Next we will gate the cytokine-positive cells

In order to make inferences on the proportions of cytokine positive cells in the stimulated and non-stimulated samples within each subject, we should, ideally, apply the same gate to both samples within each subject. We can do this using the groupBy column in the template, to specify the Patient ID (PID) as the grouping variable, and set collapseDataForGating to TRUE in order to use both samples together to derive the gate.

The cytokines of interest are:

kable(pData(parameters(getData(tbdata_subset[[1]])))[,1:2],row.names=FALSE)
name desc
Time NA
FSC-A NA
FSC-H NA
SSC-A NA
IFNg
AViD
CD14
IL2
CD3
CD154
PE Cy55-A NA
IL22
IL4
IL17a
CD4
TNFa
CD8

AViD, CD14, CD3, CD4 and CD8 are phenotypic markers. We want to gate IL2, IFNg, CD154, IL22, IL4, IL17a and TNFa marginally for CD4+/CD8- T cells.

TNFa

        template = rbind(
          template, add_pop(
          tbdata_subset,alias = "TNF",pop = "TNF+",dims = "TNFa",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
          "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
          )
          )
          ggcyto(tbdata_subset,mapping = aes(x = "TNFa",y = "SSC-A"),subset = "CD4+CD8-") +
          geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
          facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

IL2

          template = rbind(
            template, add_pop(
            tbdata_subset,alias = "IL2",pop = "IL2+",dims = "IL2",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
            "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
            )
            )
          ggcyto(tbdata_subset,mapping = aes(x = "IL2",y = "SSC-A"),subset = "CD4+CD8-") +
            geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
            facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

IL22

          template = rbind(
            template, add_pop(
            tbdata_subset,alias = "IL22",pop = "IL22+",dims = "IL22",gating_method =
            "tailgate",gating_args = "auto_tol=TRUE,adjust=2",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
            )
            )
          ggcyto(tbdata_subset,mapping = aes(x = "IL22",y = "SSC-A"),subset = "CD4+CD8-") +
            geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
            facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

IL4

          template = rbind(
            template, add_pop(
            tbdata_subset,alias = "IL4",pop = "IL4+",dims = "IL4",gating_method = "tailgate",gating_args = "auto_tol=TRUE",parent =
            "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
            )
            )
            ggcyto(tbdata_subset,mapping = aes(x = "IL4",y = "SSC-A"),subset = "CD4+CD8-") +
            geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
            facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

CD154

            template = rbind(
              template, add_pop(
              tbdata_subset,alias = "CD154",pop = "CD154+",dims = "CD154",gating_method =
              "tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
              )
              )
              ggcyto(tbdata_subset,mapping = aes(x = "CD154",y = "SSC-A"),subset = "CD4+CD8-") +
              geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
              facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

IFNg

              template = rbind(
                template, add_pop(
                tbdata_subset,alias = "IFNg",pop = "IFNg+",dims = "IFNg",gating_method =
                "tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
                )
                )
                ggcyto(tbdata_subset,mapping = aes(x = "IFNg",y = "SSC-A"),subset = "CD4+CD8-") +
                geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
                facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

IL17a

                template = rbind(
                  template, add_pop(
                  tbdata_subset,alias = "IL17a",pop = "IL17a+",dims = "IL17a",gating_method =
                  "tailgate",gating_args = "auto_tol=TRUE",parent = "CD4+CD8-",collapseDataForGating = TRUE,groupBy = "PID"
                  )
                  )
                  ggcyto(tbdata_subset,mapping = aes(x = "IL17a",y = "SSC-A"),subset = "CD4+CD8-") +
                  geom_hex(bins = 100) + ggcyto_par_set(limits = "instrument") + geom_gate() +
                  facet_grid(PID ~ Peptide,scale = "free") + geom_gate()

Gating of the complete data set.

Now that we’ve built up our gating template we can apply it to the remainder of our data set.

We write out our gating template.

tmp = tempfile(fileext=".csv")
write.csv(template,tmp,row.names=FALSE)
gt = gatingTemplate(tmp)
data(tbdata)
Rm("Singlets",tbdata) #Remove any gates attached to the GatingSet

We do the gating in parallel.

library(parallel)
#clean up our gating set
gating(gt,tbdata,parallel_type="multicore",mc.cores=7)

Some QC

We can examine the population statistics for the different cell populations. These should be relatively stable across samples. If there are some outliers, that is cause for concern and suggests we need to go back and look at those samples.

Barplots of the coefficient of variation of each cell population, faceted by its parent population, and the peptide stimulation.

stats = getPopStats(tbdata) #extract stats
stats[,prop := Count/ParentCount] #compute the cell proportions
##            name  Population      Parent  Count ParentCount        prop
##   1: 353385.fcs    Singlets        root 197002      216918 0.908186504
##   2: 353385.fcs       CD14n    Singlets 191215      197002 0.970624664
##   3: 353385.fcs   nonDebris       CD14n  90583      191215 0.473723296
##   4: 353385.fcs Lymphocytes   nonDebris  78449       90583 0.866045505
##   5: 353385.fcs        Live Lymphocytes  76475       78449 0.974837155
##  ---                                                                  
## 604: 557788.fcs        IL22    CD4+CD8-  12319       25232 0.488229233
## 605: 557788.fcs         IL2    CD4+CD8-     61       25232 0.002417565
## 606: 557788.fcs         TNF    CD4+CD8-   1115       25232 0.044189918
## 607: 557788.fcs    CD4-CD8+         CD3  34955       80621 0.433571898
## 608: 557788.fcs    CD4+CD8+         CD3     95       80621 0.001178353
stats = merge(stats,pData(tbdata),by="name")
data.table::setDT(stats)
ggplot(stats[,.(CV = mad(prop) / median(prop)),.(Parent,Peptide,Population)]) +
  geom_bar(position = "dodge",stat = "identity") + aes(y = CV,x = Population,fill =
  Peptide) + theme(axis.text.x = element_text(angle = 90,hjust = 1)) + facet_wrap( ~
  Parent,scales = "free_x")

Proportions of each cell population.

ggplot(stats) + geom_boxplot() + aes(y = prop,x = Population) + geom_jitter() +
  theme(axis.text.x = element_text(angle = 90,hjust = 1)) + facet_wrap( ~
  Parent,scales = "free")

There are some clear outliers in the CD8 population. We can have a closer look at those samples.

nm  = stats[,.(name,rZ = (prop - median(prop)) / mad(prop)),.(Population)][,.(outlier = abs(rZ) >
                                                                                3),.(name,Population)][outlier == TRUE][order(name)][Population %in% "CD8+"]$name
                                                                                
                                                                                ggcyto(tbdata[as.character(nm)],mapping = aes(x = "CD8"),subset = "CD3") +
                                                                                geom_histogram(binwidth = 25) + geom_gate("CD8+")

We see the issue is one sample where the automated gate completely failed to find the CD8+ cells. For some reason they are really poorly resolved in this sample, even in the original manual gating.

Manual intervention to adjust the gates on the outlier

We can decide to set a hard threshold at 1750 for the CD8+ cells. We adjust all the gates for the CD8 containing cell populations

for(n in nm){
  mygate = getGate(tbdata[[n]],"CD8+") #extract a gates
  mygate@min["<PerCP Cy55 Blue-A>"] = 1750 #adjust the min value for CD8
  setGate(tbdata[[n]],"CD8+",mygate) #set the gate for the cell population
  recompute(tbdata[[n]]) # recompute the statistics.

  #again for the others
  mygate = getGate(tbdata[[n]],"CD4-CD8+")
  mygate@min["<PerCP Cy55 Blue-A>"]=1750
  mygate@max["<PerCP Cy55 Blue-A>"]=Inf
  setGate(tbdata[[n]],"CD4-CD8+",mygate)


  mygate = getGate(tbdata[[n]],"CD4+CD8-")
  mygate@max["<PerCP Cy55 Blue-A>"]=1750
  setGate(tbdata[[n]],"CD4+CD8-",mygate)

  mygate = getGate(tbdata[[n]],"CD4-CD8-")
  mygate@max["<PerCP Cy55 Blue-A>"]=1750
  setGate(tbdata[[n]],"CD4-CD8-",mygate)

  mygate = getGate(tbdata[[n]],"CD4+CD8+")
  mygate@min["<PerCP Cy55 Blue-A>"]=1750
  setGate(tbdata[[n]],"CD4+CD8+",mygate)

  recompute(tbdata[[n]],"CD4-CD8+")
  recompute(tbdata[[n]],"CD4+CD8-")
  recompute(tbdata[[n]],"CD4-CD8-")
  recompute(tbdata[[n]],"CD4+CD8+")
}
plotGate(tbdata[as.character(nm)],c("CD4+CD8-","CD4-CD8+","CD4-CD8-","CD4+CD8+"),
         xbin=256,margin=FALSE)

The above at least reflects what we expect from the biology, few double positive T-cells.

Empirical comparison of responders and non-responders

We’ll have a look at the proportion of cytokine positive cells in ESAT-6 and DMSO stimulated samples for each subject.

library(scales)
#custom hyperbolic arcsine transformation.
asinh_trans = function (c=800) 
{
    trans_new(name = "asinh", transform = function(x) asinh(x * 
        c), inverse = function(x) sinh(x)/c)
}
stats = merge(pData(tbdata),getPopStats(tbdata),by="name")
data.table::setDT(stats)
stats = stats[Parent=="CD4+CD8-"][,prop:=Count/ParentCount]
stats=reshape2::dcast(stats,PID+Population+Parent~Peptide)


ggplot(stats) + geom_boxplot(outlier.colour = NA) + geom_jitter(aes(fill =
      Population),position = position_jitterdodge()) + aes(x = Population,y =
      pmax(`ESAT-6` - DMSO,0),col = Population) + facet_wrap( ~ Parent,scales =
      "free") + theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
      scale_y_continuous("Background Corrected Proportion",trans =
      "asinh") + 
      ggtitle("Marginal Background Corrected Proportions\nof Cytokine Producing CD4 Cells") +
                                                                      scale_x_discrete("Marginal Cytokine")

There are some clear marginal responses in IFNg, IL2, IL17a. We can look at combinatorial subsets of cells using the COMPASS framework, which models antigen-specific responses jointly in all cell subsets.

Construct a COMPASS Container object from single-cell ICS data

COMPASS will test whether the ESAT-6 stimulation is greater than the DMSO stimulation for each subject and each cell subset.

It returns a probability of response.

library(COMPASS)
data = getSingleCellExpression(tbdata,nodes = c("IL2","IFNg","CD154","IL17a","IL4","TNF","IL22")) #single-cell expression of seven markers
counts = as.vector(getPopStats(tbdata,subpopulations = "CD4+CD8-",statistic =
"count")[,.(name,Count)])  #parent population cell counts
counts = as.vector(as.matrix(counts[,2,with = FALSE]))
names(counts) = names(data)
meta = pData(tbdata) #metadata

CC = COMPASSContainer(
data = data,counts = counts,meta = meta,individual_id = "PID",sample_id = "name"
)
set.seed(100)
compass_result = COMPASS(CC,treatment = Peptide == "ESAT-6",control = Peptide ==
"DMSO")

Heatmap of posterior probability of response

p = plot(compass_result,threshold=0.01)

Functionality score

ggplot(cbind(compass_result$data$meta,PFS = FunctionalityScore(compass_result))) +
  geom_boxplot(outlier.colour = NA) + geom_jitter(aes(col = known_response)) +
  aes(x = known_response,y = PFS) + theme_gray()

The differences between the known_response variable and the calculated polyfunctionality scores are due to

  1. Different numbers of subjects entering the model. The complete study had 20 subjects, which means some cell subsets may have been filtered inappropriately with fewer subjects.
  2. Differences between manual and automated gating. The automated gating scheme here is very rough and meant to give a quick overview of the data.

That’s all!

Hopefully this provides you with a sufficient introduction to some of the new features and new API of openCyto.

References

Lin, Lin, Greg Finak, Kevin Ushey, Chetan Seshadri, Thomas R Hawn, Nicole Frahm, Thomas J Scriba, et al. 2015. “COMPASS identifies T-cell subsets correlated with clinical outcomes.” Nature Biotechnology, May.