July, 2014

What is OpenCyto?

Not an algorithm, but a framework for automated gating.

Goals

  • Easily build reproducible gating pipelines.
  • Use any gating algorithm
    • interchange any algorithm at any step (support gating plugins)
  • Simplify data handling and data management.
    • Easily pass subsets of the data (cell subsets) to different gating algorithms.
  • Simple(r) pipeline template definitions
    • Pipeline defined via text file (csv)
    • Templates and code are re-usable for standardized assays and data.
  • Facilitate comparative analysis
    • Import manually gated data from FlowJo workspaces
  • Scale to large data sets
    • HDF5 support - data sets limited by disk space not RAM.
www.bioconductor.org

Overview

Raw data âž™ Preprocessing âž™ Annotation âž™ Gating âž™ Statistical analysis âž™ Output

The OpenCyto Gating Framework is a collection of R/BioConductor packages for easily building reproducible flow data analysis pipelines.

www.bioconductor.org

Getting Started

Installation

Requirements: R + Bioconductor

  • Install release version of R from CRAN.
  • Install release version of BioConductor from bioconductor.org/install
  • Install OpenCyto and its dependencies
    • Within R type the following:
require(BiocInstaller)  
biocLite("openCyto")
This installs all the required packages.

Still have problems? Bioconductor mailing list
Email: Mike Jiang or Greg Finak
Twitter: @OpenCyto

www.bioconductor.org

Getting Started II

Alternately if you are brave and want the latest bug fixes and features - github.com/RGLab

require(devtools)
packages<-c("RGLab/flowStats","RGLab/flowCore","RGLab/flowViz","RGLab/ncdfFlow","RGLab/flowWorkspace","RGLab/openCyto")
install_github(packages,quick=TRUE)

You may use the devtools package to install the latest stable versions directly from github.

A Worked Example

Intracellular Cytokine Staining of Antigen-stimulated T-cells

  • Full data set at Flowrepository.org FR-FCM-ZZ7U
  • Batch 0882, 76 sample files, 13 compensation controls.
ws<-openWorkspace("data/workspace/080 batch 0882.xml")
FlowJo Workspace Version  2.0 
File location:  data/workspace 
File name:  080 batch 0882.xml 
Workspace is open. 
Groups in Workspace
          Name Num.Samples
1  All Samples         158
2   0882-L-080         157
3        Comps          13
4 0882 Samples          76
www.bioconductor.org

Import Manual Gating (parseWorkspace)

Create a gating set of manual gates.

gating_set<-parseWorkspace(ws,name="0882 Samples",path="data/FCS/",isNcdf=TRUE)
## loading R object...
## loading tree object...
## Done
Parsing 76 samples
calling c++ parser...
...

We now have gated, compensated and transformed data in an HDF5 file represented in a GatingSet object. We can save it for later use.

save_gs(gating_set,path="data/manual_gating")
saving ncdf...
saving tree object...
saving R object...
Done
To reload it, use 'load_gs' function

The archived gating set contains all the information on transformation, compensation, single-cell events, and gates and can be shared with collaborators.

www.bioconductor.org

Visualizing the Gating Layout (plotGate)

plotGate(gating_set[[1]],xbin=16,gpar=list(ncol=5)) # Binning for faster plotting

Layout of manual gates

www.bioconductor.org

Visualizing the Gating Tree (plot)

Calling plot on the gating set gives us a view of the gating tree.
www.bioconductor.org

Annotation

We annotate our gating set from the keywords and flowrepository. We'll keep only the GAG and negative control stimulations

keyword_vars<-c("$FIL","Stim","Sample Order","EXPERIMENT NAME") #relevant keywords
pd<-data.table(getKeywords(gating_set,keyword_vars)) #extract relevant keywords to a data table
annotations<-data.table:::fread("data/workspace/pd_submit.csv") # read the annotations from flowrepository
pd<-data.frame(annotations[pd]) #data.table style merge
setnames(pd,c("Timepoint","Individual"),c("VISITNO","PTID"))
pData(gating_set)<-pd #annotate
name Condition VISITNO PTID Sample.Description
769121.fcs negctrl 5 080-17 PBMCs from healthy subjects
769122.fcs negctrl 5 080-17 PBMCs from healthy subjects
769193.fcs GAG-1-PTEG 5 080-17 PBMCs from healthy subjects
769225.fcs POL-1-PTEG 5 080-17 PBMCs from healthy subjects
www.bioconductor.org

Clone and save for automated gating

We want to perform automated gating of this data.

  • We'll clone the gating set, delete existing nodes and re-save the data as a new gating set.
auto_gating<-clone(gating_subset)
Rm("S",auto_gating)
save_gs(auto_gating,path="data/autogating",overwrite=TRUE)
list.files("data/autogating")
## [1] "NHxz3bpHGl.dat"     "NHxz3bpHGl.rds"     "file9e8620253f4.nc"
  • .nc file is the HDF5 file of event-level data..
  • .dat file contains the gating set representation from the C data structure.
  • .rds file is an R data file that contains the R-object information.

Send it to a friend, load_gs() will read it all in and the data will be available.

www.bioconductor.org

Costructing a Template - I

alias pop parent dims gating_method gating_args groupBy preprocessing_method
boundary boundary root FSC-A,SSC-A boundary max=c(2.5e5,2.5e5)
singlet singlet boundary FSC-A,FSC-H singletGate prediction_level=0.999,wider_gate=TRUE,subsample_pct=0.2
viable viable- singlet AViD mindensity gate_range=c(500,1000)
nonNeutro nonNeutro- viable SSC-A mindensity gate_range=c(5e4,1.5e5)
DebrisGate DebrisGate+ nonNeutro FSC-A mindensity gate_range=c(0,1e+05)
nonDebris nonDebris+ viable FSC-A refGate DebrisGate
lymph lymph nonDebris FSC-A,SSC-A flowClust K=2,quantile=0.99 prior_flowClust
cd3 cd3+ lymph cd3 mindensity
www.bioconductor.org

Costructing a Template - II

Each row defines a cell population

  • alias: how we refer to the population / shorthand
  • pop: The population definition i.e. do we keep the positive (+) or negative (-) cells for a marker / pair of markers after gating.
  • parent: The alias of the parent population on which the current population is defined
  • dims: The dimensions / markers used to define this cell population.
  • gating_method: Which gating algorithm to use.
  • gating_args: additional arguments passed to the gating method to tweak various parameters
  • collaseDataForGating: TRUE or FALSE. Together with groupBy will gate multiple samples with a common gate.
  • groupBy: Specify metadata variables for combining samples (e.g. PTID)
  • preprocessing_method: advanced use for some gating methods
  • preprocessing_args: additional arguments
www.bioconductor.org

Constructing a Template - III

We read in the template and visualize it

gt<-gatingTemplate("data/template/gt_080.csv")
plot(gt)

www.bioconductor.org

Automated Gating

openCyto walks through the template and gates each population in each sample using the algoirthm named in the template.

gating(x = gt, y =  auto_gating)
## Some output..
plot(auto_gating)

plot of chunk plot_autogate_tree

www.bioconductor.org

Automated Gating - II

p1<-plotGate(auto_gating[[1]],c("cd8","cd4"),arrange=FALSE,
             projections=list("cd4"=c(x="CD8",y="CD4"),"cd8"=c(x="CD8",y="CD4")),
             main="Automated Gate",path=2)[[1]]
p2<-plotGate(gating_subset[[1]],c("8+","4+"),
             projections=list("4+"=c(x="CD8",y="CD4"),"8+"=c(x="CD8",y="CD4")),
             arrange=FALSE,main="Manual Gate",path=2)[[1]]

grid.arrange(arrangeGrob(p1,p2,ncol=2))

Automated and Manual Gates for CD4/CD8 Stats and gates are comparable, could be tweaked if necessary, but importantly it's reproducible. Always generate the same result.

www.bioconductor.org

Extract Stats and Compare Manual to Automated

Comparison of Manual vs Automated Gating Cell Subset Counts

#Extract stats
auto_stats<-getPopStats(auto_gating,statistic="count")
manual_stats<-getPopStats(gating_subset,statistic="count")

Note Perforin is incorrectly gated in the manual analysis. Minor differences at low end, but reproducible and objective.

www.bioconductor.org

Perforin - Cytokine gate and reference gate

Automated and Manual Gating of Perforin/CD8 plot of chunk example_derivative_gate Automated gate set on CD57- (reference). Perforin-negative cells included in the manual gate.

www.bioconductor.org

TNFa

Automated and manual gating of TFNa

www.bioconductor.org

Some Useful Functions

Return a flowSet containing event-level data for the named cell population.

cd3_population<-getData(auto_gating,"cd3")

Plot a named cell population.

plotGate(auto_gating,"cd3")

Subset by FCS file(s).

first_ten_fcs_files<-auto_gating[1:10]

List supported gating methods (that can be used in a template).

listgtMethods()

Register a new gating or preprocessing plugin.

registerPlugins(myfunction,methodName,dependencies, "preprocessing"|"gating")
www.bioconductor.org

Some More Useful Functions

Generate a Basic GatingTemplate from a Manual Gating Hierarchy.

templateGen(gating_subset[[1]])
alias pop parent dims gating_method gating_args collapseDataForGating groupBy preprocessing_method preprocessing_args
S S root FSC-H,FSC-A
Lv Lv /S ,FSC-A
L L /S/Lv SSC-A,FSC-A
3+ 3+ /S/Lv/L ,
8+ 8+ /S/Lv/L/3+ ,
TNFa+ TNFa+ /S/Lv/L/3+/8+ ,
Perforin+ Perforin+ /S/Lv/L/3+/8+ ,

Just fill in the gating_method and dims to get started.

www.bioconductor.org

Some Typical Gating Methods Use Cases

  • mindensity: Finds the minimum density cut point between two primary populations. Can be restricted to a range of the data.
  • cytokine / tailgate: Identifies rare populations that are in the tails of a large primary population. Estimates 2st derivative of the denstiy. Smoothing and tolerance can be adjusted.
  • flowClust: 1D, 2D, or n-Dimensional clustering. Generally useful for lymphocytes. Can infer a data-driven empirical-Bayes prior across samples.
  • singletGate: Fits a model that approximates a typical singlet gate on scatter area vs height or width.
  • boundary: Filters out boundary events.
  • refGate: A reference gate. Used to refer to a gate defined elsewhere in the hierarchy, the data-driven threshold can be reused. Similar to "back-gating".
  • flowDensity: Supported via plugin, density-based gating from our good friends at the BC Cancer Agency.
  • Other methods: rangeGate, quadrantGate, quantileGate
www.biocondcutor.org

Your turn

Under /data/OpenCyto you'll find the data and some R code to reproduce the analysis in these slides. Begin there.

  • Copy the code to your home directory and work from there.
  • Paths are set up to write to your home directory.
  • Load the workspace and reproduce the manual gating.
  • Clone the gating set and clear the manual gates.
  • Load the template and run the automated gating.
  • Open up the OpenCytoPracticalComponent document and work through that.
  • Play with the gating parameters to see what effect they have.

Gating Method Examples - mindensity

Infers a gate threshold based on the minimum density separating two major populations in a 1-D density estimate.

  • Open the csv template in a new window, the definition for the live/dead gate is:

viable, viable-, singlet, AViD, mindensity, gate_range=c(500,1000)
gate_range restricts the data region where the method searches for a cutpoint.
adjust alters the smoothing (larger value = more smoothing, less bumpy)

  • Play with these parameters in the gating template.
    • What happens if you remove the gate_range parameter?
    • What happens if you set adjust=0.75?
# Work with a subset of the data
auto_subset<-auto_gating[1:20] # first 20 samples
# Make changes in the template. SAVE it, then re-load it in R.
gt<-gatingTemplate("data/template/gt_080.csv")
# Remove the old gate from auto_gating
Rm("viable",auto_subset)
# Gate with the new template, stop after the viable gate (which is nonNeutro)
gating(gt,auto_subset,stop.at="nonNeutro")
# View the new result
plotGate(auto_subset,"viable")

Where we've used OpenCyto

Studies wtih 100s of GB of data.
None take more than a couple of hours to run.

  • Gating of Lyoplate standardized staining panels (FlowCAP III)
  • Many large clinical trial ICS data sets at the HVTN
  • ICS data - MTB infected vs. healthy subjects.
  • CyTOF combinatorial cytokine data
    • Exhaustive gating of polyfunctional T-cell subsets.

Event-level data and cell population memberships can easily be shared with collaborators.

Quickly pushed to downstream analysis.
* MIMOSA (PMC3862207)
* COMPASS (http://rglab.github.org/COMPASS/)

We're usually pretty friendly and can help you get started

Summary

  • A framework for standardizing analysis pipelines.
  • Flexible support of different gating approaches via plugins
    • e.g flowDensity is supported.
  • Re-usable templates and code
    • Work is done up front for set up.
    • Fully reproducible and objective, data-driven gating.
  • OpenCyto simplifies
    • Data import (raw data and/or manual gating from FlowJo workspace)
    • Preprocessing
    • Data manipulation (interacting with cell subsets)
    • Plotting and visualization
    • Extracting statistics for reports
    • Downstream analysis with the full power of R's tools.
www.bioconductor.org

Online Resources

Acknowledgements

RGLab @ Fred Hutchinson Cancer Research Center
Raphael Gottardo
Mike Jiang
Jacob Frelinger
John Ramey

Collaborators
Steve De Rosa @ HIV Vaccine Trials Network
Adam Asare @ Immune Tolerance Network
Evan Newell @ Singapore Immunology Network
Mark Davis @ Stanford
Adam Triester @ TreeStar Inc.
Jay Almarode @ TreeStar Inc.

Funding
National Institutes of Health
Human Immune Project Consortium (HIPC)

www.bioconductor.org