Cytometry data with ggCytoggCyto package.The ggCyto package was developed at the RGLab by Mike Jiang. It is new and under active development.
Here we aim to demonstrate some of its functionality.
By overloading ggplot’s fortify method, we make cytometry data fully compatible with ggplot.
It usually doesn’t make sense to visualize 100’s or FCS samples, so we’ll subset the data for visualization, restricting ourselves to the first two subjects. We can use subset on the GatingSet object to subset by pData variables. This is pretty standard BioConductor behavior.
# Subset the data for a demo of visualization.
ptids <- unique(pData(tbdata)[["PID"]])[1:2]
tbdata <- subset(tbdata, `PID` %in% ptids)
Rm("CD4",tbdata)
Furthermore, we’ll focus on the CD3+ cell subsets for this demonstration. These are extracted into a flowSet.
# extract the CD3 population
fs <- getData(tbdata, "CD3")
The simplest way to visualize FCM data is via one-dimensional histograms or density plots. This is supported using the standard ggplot2 geom_xxxx interface.
Here we specify that we want a histogram, and we map the aesthetic x to the variable CD4, which corresponds to the dimension/marker we want to plot.
ggCyto automatically facets by the name variable, which usually represents individual FCS files.
p <- ggcyto(fs, aes(x = CD4))
p1 <- p + geom_histogram(bins = 60)
p1
ggCyto will show you the full range of the data, which is often more than the instrument range. We can restrict the range to the instrument range, using ggcyto_par_set.
Valid values are data and instrument.
myPars <- ggcyto_par_set(limits = "instrument")
p1 + myPars
We can print the default parameter settings using ggcyto_par_default.
# print the default settings
ggcyto_par_default()
## $limits
## [1] "data"
##
## $facet
## facet_wrap(name)
##
## $hex_fill
## continuous_scale(aesthetics = "fill", scale_name = "gradientn",
## palette = gradient_n_pal(colours, values, space), na.value = na.value,
## trans = "sqrt", guide = guide)
##
## $lab
## $labels
## [1] "both"
##
## attr(,"class")
## [1] "labs_cyto"
##
## attr(,"class")
## [1] "ggcyto_par"
Of course, other geometries are supported. geom_density will generate a denstiy plot rather than a histogram.
p = p + geom_density() + geom_density(fill = "black") + myPars
p
As you saw, the default faceting is using the name variable. But, any variable defined in the pData slot of the flowSet is valid.
kable(pData(fs))
| Peptide | Stim | PID | EXPERIMENT NAME | name | known_response | |
|---|---|---|---|---|---|---|
| 353385.fcs | DMSO | General | 01-0917 | 130517_TB-ICS_ACS_GP | 353385.fcs | non-responder |
| 353387.fcs | ESAT-6 | General | 01-0917 | 130517_TB-ICS_ACS_GP | 353387.fcs | non-responder |
| 353421.fcs | DMSO | General | 01-0996 | 130517_TB-ICS_ACS_GP | 353421.fcs | responder |
| 353423.fcs | ESAT-6 | General | 01-0996 | 130517_TB-ICS_ACS_GP | 353423.fcs | responder |
Here we facet by Peptide stimulation and known_response (which comes from previous analysis).
#change facetting (default is facet_wrap(~name))
p + facet_grid(known_response ~ Peptide)
The typical view of FCM data is using two-dimensional dot plots. Hexagonal binning is a popular and rapid was to view the data.
Again, axis limits need to be specified since by default ggcyto will present all the data, includig outliers that have unusually large positive or negative values.
# 2d hexbin
p <- ggcyto(fs, aes(x = CD4, y = CD8)) + geom_hex(bins = 60) + ylim(c(-100,4e3)) + xlim(c(-100,3e3))
p
The default colour scale can be changed using scale_fill_gradient.
For example, a color brewer scale using the PiYG scale, with a square root transform of the counts.
p + scale_fill_gradientn(colours = brewer.pal(n=8,name="PiYG"),trans="sqrt")
Or grayscale.
p + scale_fill_gradient(trans = "sqrt", low = "gray", high = "black")
geom_density2d behaves as expected.
ggcyto(fs, aes(x = CD4, y = CD8))+ geom_hex(bins = 60)+geom_density2d(colour = "black")+ylim(c(-100,4e3)) + xlim(c(-100,3e3))
It’s possible to plot gates on top of the data.
One way to do so is to extract the gate from the GatingSet and add it explicitly.
# add geom_gate layer
p <- ggcyto(fs, aes(x = CD4, y = CD8)) + geom_hex(bins = 60) + ylim(c(-100,4e3)) + xlim(c(-100,3e3))
g <- getGate(tbdata, "CD4+")
p <- p + geom_gate(g)
p
Overlay statistics for cell populations.
# add geom_stats
p + geom_stats()
As before, but we can use a GatingSet object rather than a flowSet. Again, dimensions are mapped to aesthetics using marker names.
#use customized range to overwrite the default data limits
myPars <- ggcyto_par_set(limits = list(y = c(-100,4e3), x = c(-100,3e3)))
p <- ggcyto(tbdata, aes(x = CD4, y = CD8), subset = "CD3")
p <- p + geom_hex(bins = 64) + myPars
p
If we want to use marker names on the axes rather than channel and marker names, that is possible.
#only display marker on axis
p <- p + labs_cyto("marker")
p
When plotting gates, we don’t need to extract them explicitly as they’re part of the object.
One gate.
# add gate
p + geom_gate("CD4+CD8-")
Two gates.
# add two gates
p <- p + geom_gate(c("CD4+CD8-","CD4-CD8-"))
p
Overlay population statistics.
p + geom_stats()
Overlay population statistics for just one population.
# add stats just for one specific gate
p + geom_stats("CD4+CD8-")
Change the background color, style, and report the count rather than the percentage.
# change stats type, background color and position
p + geom_stats("CD4+CD8-", type = "count", size = 6, color = "white", fill = "black", adjust = 0.3)
As you can see there is a great deal of flexibility in using the ggplot2 interface to interact with FCM plots.
Say you want to plot the CD4 and CD8 cell populations, but don’t necessarily know the parent population.
To do this with ggCyto would look like:
#'subset' is ommitted
p <- ggcyto(tbdata, aes(x = CD4, y = CD8)) + geom_hex(bins = 64) + myPars + geom_gate(c("CD4+CD8-", "CD4-CD8-"))
p
We define the dimensions and the gates. There’s no need to specify the parent population. ggCyto will subset the parent population and plot the relevant events.
The subset argument allows us to explicitly subset a parent population. When subset is specified, ggCyto plots all child populations.
Rm("CD8+",tbdata)
Rm("CD4+",tbdata)
p <- ggcyto(tbdata, aes(x = CD4, y = CD8), subset = "CD3") + geom_hex(bins = 64) + geom_gate() + geom_stats() + myPars
p
By default, ggCyto plots the data in the transformed space (if it’s been transformed). For FCM data processed by flowJo, this is in [0,4096], or so-called channel space.
Because we store the data transformation, we can transform the axes and show the raw fluorescence intensities on the x and y axes using axis_x_inverse_trans and axis_y_inverse_trans.
p + axis_x_inverse_trans() + axis_y_inverse_trans()
ggcyto objectWe have defined a ggcyto object that delays transformation the data until it is plotted. This makes things a little faster as we don’t have to do any melting or reshaping of the underlying data until we need it.
The ggcyto object is entirely ggplot2 compatible, in terms of adding layers and parameters.
class(p)
## [1] "ggcyto_GatingSet" "ggcyto_flowSet" "ggcyto"
## [4] "gg" "ggplot"
class(p$data)
## [1] "GatingSet"
## attr(,"package")
## [1] "flowWorkspace"
You can use as to return a ggplot object.
# To return a regular ggplot object
p <- as.ggplot(p)
class(p)
class(p$data) # it is fortified now
ggplot directly on flowSet objects:Please open issues and file bug reports or unexpected behaviour on our github page. http://github.com/RGLab/ggcyto