Contents

1 Introduction

1.1 Motivation

Massively-Parallel Cytometry (MPC) experiments allow cost-effective quantification of more than 200 surface proteins at single-cell resolution. The Inflow protocol (Becht et al. 2021) is the pioneer of the pipeline for analysing MPC data, and the Bioconductor’s infinityFlow package was developed for comprehensive analyses. However, the methods for background correction and removal of unwanted variation implemented in the package can be improved. We develop the MAPFX package as an alternative that has a more thoughtful strategy to clean up the raw protein intensities. Unique features of our package compared to the infinityFlow pipeline include performing background correction prior to imputation and removing unwanted variation from the data at the cell-level, while explicitly accounting for the potential association between biology and unwanted factors. We benchmarked our pipeline against the infinityFlow pipeline and demonstrated that our approach is better at preserving biological signals, removing unwanted variation, and imputing unmeasured infinity markers (Liao et al. 2024). Two user friendly functions MapfxMPC and MapfxFFC are included in the MAPFX package that were designed for data from either MPC or FFC experiments (see below sections for details).

1.2 Experimental and Computational Pipeline of the Data from the Massively-Parallel Cytometry (MPC) Experiments

The experimental and the computational pipeline of the Inflow protocol (Becht et al. 2021): (A) Experimental pipeline. The single-cell samples are stained with backbone markers (backbone panel staining), then the stained samples are allocated to wells with one particular infinity marker (infinity panel staining), lastly, data can be acquired from the flow cytometry assay for each well. (B) Computational pipeline. The matrix of the normalised data showing that the backbone matrix (gray) contains values for every single-cell (row), but only block diagonal entries of the infinity matrix (yellow) have measurements. Imputation of the unmeasured infinity markers is done by using the backbone markers as predictors in regression models. Finally, the completed data matrix is obtained after imputation. The above figure is extracted from Figure 1 of the paper by Liao et al. (2024).

1.3 Analysing Data from MPC Experiments

This package implemented an end-to-end toolbox for analysing raw data from MPC experiments. More details on the methodology can be found in Liao et al. (2024). The MapfxMPC function is designed for running through the whole pipeline. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors with machine learning models. Cluster analysis and visualisation with UMAP two-dimensional representations can then be carried out if desired. Users can set MapfxMPC(..., impute=FALSE) if the imputation is not needed.

1.4 Analysing Data from the Fluorescence Flow Cytometry (FFC) Experiments

For the protein intensities from FFC experiments, the function MapfxFFC is used to carry out normalisation steps which include background correction and removal of unwanted variation, and the function can further perform cluster analysis and visualisation with UMAP two-dimensional representations if specified.

1.5 Preparing Data for the Analysis - the Folder Diagram

# FCSpath
└───FCSpath
│   └───fcs
│       │   Plate1_A01.fcs
│       │   Plate1_A02.fcs
│       │   ...
│   └───meta
│       │   filename_meta.csv

# Outpath
└───Outpath
│   └───intermediary
│   └───downstream
│   └───graph

## Note: the sub-folders `intermediary`, `downstream`, and `graph` will 
## be generated automatically by MAPFX.

1.5.1 Notes on Metadata

1.5.1.1 For MPC (the plate-based) Experiments

When set file_meta = "auto" for MapfxMPC, the file identifier keyword (GUID) of the FCS files MUST contain the following information and in the specified format:
Plate information: Plate1, Plate2, …, Plate9
Well information: A1, A2, …, A12, B1, …, H1, …, H12

When set file_meta = "usr", prepare filename_meta.csv in the following format and save the CSV file under FCSpath/meta/.
An example:

Filenam Plate Well Column Row Well.lab
p1_a12.fcs Plate1 A12 Col.12 Row.01 P1_A12
p2_d08.fcs Plate2 D08 Col.08 Row.04 P2_D08
p3_g1.fcs Plate3 G01 Col.01 Row.07 P3_G01

Note that the “Filenam” column refers to the GUID (file name) of each FCS file in the FCSpath/fcs/.

1.5.1.2 For FFC Experiments from Different Batches

Prepare filename_meta.csv in the following format and save the CSV file in FCSpath/meta/.
An example:

Filenam Batch
090122.fcs Batch1
070122.fcs Batch2
010122.fcs Batch3

2 Analysing Data with the MAPFX Package

2.1 Installation

The MAPFX package can be installed using the code below.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MAPFX")

Along with the MAPFX package, we also load the following packages required for running functions in MAPFX.

library(MAPFX)

## specify the package names
suppressPackageStartupMessages({
library(flowCore)
library(Biobase)
library(stringr)
library(uwot)
library(iCellR)
library(igraph)
library(ggplot2)
library(RColorBrewer)
library(Rfast)
library(ComplexHeatmap)
library(circlize)
library(glmnetUtils)
library(e1071)
library(xgboost)
library(parallel)
library(pbapply)
library(reshape2)
library(gtools)
library(utils)
library(stats)
library(cowplot)
})

2.2 Using the Example Datasets in MAPFX Package for this Vignette

2.2.1 MPC

This dataset is a subset of the single-cell murine lung data at steady state downloaded from FlowRepository provided by Etienne Becht (Nov 2020). The raw protein intensities and the corresponding metadata were saved in the objects ord.fcs.raw.mt_mpc and ord.fcs.raw.meta.df.out_mpc which were generated from 266 .FCS files from 266 wells with 50 cells in each file.

2.2.2 FFC

This mice splenocytes dataset contains 50 cells (sorted CD4+ and CD8+ T cells) in each .FCS files which was down-sampled from the data provided by Jalal Alshaweesh (Oct 2023) on FlowRepository. The raw protein intensities and the corresponding metadata were saved in the objects ord.fcs.raw.mt_ffc and ord.fcs.raw.meta.df.out_ffc.

2.3 MapfxMPC(..., impute=TRUE) - analysing data from MPC experiments

For users who would like to perform all of the following steps: background correction, removal of unwanted variation (well effects), imputation, and cluster analysis.

# import built-in data
data(ord.fcs.raw.meta.df.out_mpc)
data(ord.fcs.raw.mt_mpc)

# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxMPC function
dir.create(file.path(tempdir(), "MPC_impu_Output"))

# usage
# when impute = TRUE, randomly selecting 50% of the cells in each well for model training
set.seed(123) 
MapfxMPC_impu_obj <- MapfxMPC(
    runVignette = TRUE, #set FALSE if not running this Vignette
    runVignette_meta = ord.fcs.raw.meta.df.out_mpc, #set NULL if not running this Vignette
    runVignette_rawInten = ord.fcs.raw.mt_mpc, #set NULL if not running this Vignette
    FCSpath = NULL, # users specify their own input path
    Outpath = file.path(tempdir(), "MPC_impu_Output"), # or users specify their own output path
    file_meta = "auto",
    bkb.v = c(
    "FSC-H", "FSC-W", "SSC-H", "SSC-W", "CD69-CD301b", "MHCII", 
    "CD4", "CD44", "CD8", "CD11c", "CD11b", "F480", 
    "Ly6C", "Lineage", "CD45a488", "CD24", "CD103"),
    yvar = "Legend", 
    control.wells = c(
    "P1_A01", "P2_A01", "P3_A01",
    "P3_F04", "P3_F05", "P3_F06", "P3_F07", "P3_F08", 
    "P3_F09", "P3_F10", "P3_F11", "P3_F12",
    "P3_G01", "P3_G02"),
    bkb.upper.quantile = 0.9, 
    bkb.lower.quantile = 0.1, 
    bkb.min.quantile = 0.01,
    inf.lower.quantile = 0.1, 
    inf.min.quantile = 0.01, 
    plots.bkc.bkb = TRUE, plots.bkc.inf = TRUE, 
    plots.initM = TRUE,
    plots.rmWellEffect = TRUE,
    impute = TRUE,
    models.use = c("XGBoost"),
    extra_args_regression_params = list(list(nrounds = 1500, eta = 0.03)),
    prediction_events_downsampling = NULL,
    impu.training = FALSE,
    plots.imputation = TRUE,
    cluster.analysis.bkb = TRUE, plots.cluster.analysis.bkb = TRUE,
    cluster.analysis.all = TRUE, plots.cluster.analysis.all = TRUE,
    cores = 4L)
## 
## 
## 
## Creating directories for output...
## 
## 
## 
## Background correcting backbone markers...
##  Estimating parameters for calibration...
## backbone: 1
## backbone: 2
## backbone: 3
## backbone: 4
## backbone: 5
## backbone: 6
## backbone: 7
## backbone: 8
## backbone: 9
## backbone: 10
## backbone: 11
## backbone: 12
## backbone: 13
## backbone: 14
## backbone: 15
## backbone: 16
## backbone: 17
##  Estimation of parameters... Completed!
##  Calibrating backbone markers (except for physical measurements)...
##  Calibration of backbone markers... Completed!
## 
## 
## 
## Background correcting infinity markers...
##  Estimating parameters for calibration AND calibrating infinity markers...
## Could not find enough cells (>=10) when used "mle.mean+3*mle.sd", so estimated alpha with the top 10 cells with "the largest values":
## 25 wells applied this strategy
## See Wellname_largest10.csv in the intermediary directory for details.
##  Calibration of infinity markers... Completed!
## 
## 
## 
## Forming a matrix of biology (M) for removal of well effect...
##  Forming logicle functions...
##  Logicle transforming raw intensity...
##  Centring logicle transformed intensities...
##  Centred logicle backbone data... Obtained!
##  Deriving initial clusters with PhenoGraph (forming the M matrix)...
## Run Rphenograph starts:
##   -Input data of 13300 rows and 17 columns
##   -k is set to 50
##   Finding nearest neighbors...
## DONE ~4.272s
##  Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~12.22s
##  Build undirected graph from the weighted links...
## DONE ~3.122s
##  Run louvain clustering on the graph ...
## DONE ~2.553s
## Run Rphenograph DONE, totally takes 22.167s.
##   Return a community class
##   -Modularity value:0.877501268487132
##   -Number of clusters:17
## 24.0883145332336
##  UMAP with backbones (MPC)/proteins (FFC)...
## 57.9767603874207
##  Visualising clusters...
##  Completed!
## 
## 
## 
## Removal of well effect...
##  Estimating coefficients for removing well effect (Rfast - pre.adj)...
## Processing backbone: 1
## Processing backbone: 2
## Processing backbone: 3
## Processing backbone: 4
## Processing backbone: 5
## Processing backbone: 6
## Processing backbone: 7
## Processing backbone: 8
## Processing backbone: 9
## Processing backbone: 10
## Processing backbone: 11
## Processing backbone: 12
## Processing backbone: 13
## Processing backbone: 14
## Processing backbone: 15
## Processing backbone: 16
## Processing backbone: 17
##  Estimation completed!
##  Removing well effect for backbone markers...
##  Adjustment completed!
##  Examining the existence of well effect in the adjusted data (Rfast - post.adj)...
## Processing backbone: 1
## Processing backbone: 2
## Processing backbone: 3
## Processing backbone: 4
## Processing backbone: 5
## Processing backbone: 6
## Processing backbone: 7
## Processing backbone: 8
## Processing backbone: 9
## Processing backbone: 10
## Processing backbone: 11
## Processing backbone: 12
## Processing backbone: 13
## Processing backbone: 14
## Processing backbone: 15
## Processing backbone: 16
## Processing backbone: 17
## 
## 
## 
## Imputation got started...
##  Fitting regression models...
##  Randomly selecting 50% of the cells in each well for model training...
##      Fitting...
##      XGBoost
##  32.1548521518707 seconds
##  Imputing infinity (unmeasured well-specific) markers...
##      Randomly drawing events to predict from the test set (if it's been asked)
##      Imputing...
##      XGBoost
##  11.5692937374115 seconds
##      Concatenating predictions...
##      Writing to disk...
##      Visualising the accuracy of the predictions... (using testing set)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## 
## 
## 
## Cluster analysis with adjusted backbone markers and completed dataset for cells in the testing set...
## 
## Clustering with normalised backbones
## 
## Running UMAP...
## 
## Running Phenograph...
## 
## Run Rphenograph starts:
##   -Input data of 6650 rows and 17 columns
##   -k is set to 50
## 
##   Finding nearest neighbors...
## 
## DONE ~1.13799999999998s
##  Compute jaccard coefficient between nearest-neighbor sets...
## 
## DONE ~6.24599999999998s
##  Build undirected graph from the weighted links...
## 
## DONE ~1.476s
##  Run louvain clustering on the graph ...
## 
## DONE ~0.766999999999996s
## 
## 
## Run Rphenograph DONE, totally takes 9.62699999999995s.
## 
##   Return a community class
##   -Modularity value:0.888907118397234
## 
## 
##   -Number of clusters:18
## 
## Clustering with normalised backbones + imputed infinity markers (XGBoost)
## 
## Running UMAP...
## 
## Running Phenograph...
## 
## Run Rphenograph starts:
##   -Input data of 6650 rows and 269 columns
##   -k is set to 50
## 
##   Finding nearest neighbors...
## 
## DONE ~7.57999999999998s
##  Compute jaccard coefficient between nearest-neighbor sets...
## 
## DONE ~6.33700000000005s
##  Build undirected graph from the weighted links...
## 
## DONE ~1.46299999999997s
##  Run louvain clustering on the graph ...
## 
## DONE ~0.682000000000016s
## 
## 
## Run Rphenograph DONE, totally takes 16.062s.
## 
##   Return a community class
##   -Modularity value:0.920143588207911
## 
## 
##   -Number of clusters:26
## 
##  Visualising clusters...
## 
##  Completed!
## 
## 
## Cell group labels are saved in "GP.denoised.bkb" and "GP.denoised.bkb.impuInf*" columns...
## 
## 
## 
## 
## Cluster analysis with adjusted backbone markers for ALL cells...
## 
## Cluster analysis for normalised backbone measurements...
## 
## Clustering with normalised backbones
## 
## Running UMAP...
## 
## Running Phenograph...
## 
## Run Rphenograph starts:
##   -Input data of 13300 rows and 17 columns
##   -k is set to 50
## 
##   Finding nearest neighbors...
## 
## DONE ~3.58799999999997s
##  Compute jaccard coefficient between nearest-neighbor sets...
## 
## DONE ~12.326s
##  Build undirected graph from the weighted links...
## 
## DONE ~3.14499999999998s
##  Run louvain clustering on the graph ...
## 
## DONE ~2.79199999999997s
## 
## 
## Run Rphenograph DONE, totally takes 21.8509999999999s.
## 
##   Return a community class
##   -Modularity value:0.897663361184424
## 
## 
##   -Number of clusters:19
## 
##  Visualising clusters...
## 
##  Completed!
## 
## 
## Cell group labels are saved in "GP.denoised.bkb.allCells" column...
## 
##  Completed!
# check the details
help(MapfxMPC, package = "MAPFX")

All the output will be stored in file.path(tempdir(), "MPC_impu_Output") (users can specify their own path: /Outpath/).

2.4 MapfxMPC(..., impute=FALSE) - normalising data from MPC experiments

For users who would like to perform the following steps: background correction, removal of unwanted variation (well effects), and cluster analysis using backbones only.

# import built-in data
data(ord.fcs.raw.meta.df.out_mpc)
data(ord.fcs.raw.mt_mpc)

# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxMPC function
dir.create(file.path(tempdir(), "MPC_NOimpu_Output"))

# usage
MapfxMPC_NOimpu_obj <- MapfxMPC(
    runVignette = TRUE, #set FALSE if not running this Vignette
    runVignette_meta = ord.fcs.raw.meta.df.out_mpc, #set NULL if not running this Vignette
    runVignette_rawInten = ord.fcs.raw.mt_mpc, #set NULL if not running this Vignette
    FCSpath = NULL, # users specify their own input path
    Outpath = file.path(tempdir(), "MPC_NOimpu_Output"), # or users specify their own output path
    file_meta="auto",
    bkb.v = c(
    "FSC-H", "FSC-W", "SSC-H", "SSC-W", "CD69-CD301b", "MHCII", 
    "CD4", "CD44", "CD8", "CD11c", "CD11b", "F480", 
    "Ly6C", "Lineage", "CD45a488", "CD24", "CD103"),
    yvar="Legend", 
    control.wells = c(
    "P1_A01", "P2_A01", "P3_A01",
    "P3_F04", "P3_F05", "P3_F06", "P3_F07", "P3_F08", 
    "P3_F09", "P3_F10", "P3_F11", "P3_F12",
    "P3_G01", "P3_G02"),
    bkb.upper.quantile = 0.9, 
    bkb.lower.quantile = 0.1, 
    bkb.min.quantile = 0.01,
    inf.lower.quantile = 0.1, 
    inf.min.quantile = 0.01, 
    plots.bkc.bkb = TRUE, plots.bkc.inf = TRUE, 
    plots.initM = TRUE,
    plots.rmWellEffect = TRUE,
    impute = FALSE,
    cluster.analysis.bkb = TRUE, plots.cluster.analysis.bkb = TRUE,
    cores = 4L)

# check the details
help(MapfxMPC, package = "MAPFX")

All the output will be stored in file.path(tempdir(), "MPC_NOimpu_Output") (users can specify their own path: /Outpath/).

2.5 MapfxFFC - normalising data from FFC experiments

For users who would like to perform the following steps: background correction, removal of unwanted variation (batch effects), and cluster analysis.

# import built-in data
data(ord.fcs.raw.meta.df.out_ffc)
data(ord.fcs.raw.mt_ffc)

# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxFFC function
dir.create(file.path(tempdir(), "FFCnorm_Output"))

MapfxFFC_obj <- MapfxFFC(
    runVignette = TRUE, #set FALSE if not running this Vignette
    runVignette_meta = ord.fcs.raw.meta.df.out_ffc, #set NULL if not running this Vignette
    runVignette_rawInten = ord.fcs.raw.mt_ffc, #set NULL if not running this Vignette
    FCSpath = NULL, # users specify their own input path
    Outpath = file.path(tempdir(), "FFCnorm_Output"), # or users specify their own output path
    protein.v = c("CD3","CD4","CD8","CD45"),
    protein.upper.quantile = 0.9, 
    protein.lower.quantile = 0.1, 
    protein.min.quantile = 0.01,
    plots.bkc.protein = TRUE,
    plots.initM = TRUE,
    plots.rmBatchEffect = TRUE,
    cluster.analysis.protein = TRUE, plots.cluster.analysis.protein = TRUE)
## 
## 
## 
## Creating directories for output...
## 
## 
## 
## Background correcting proteins...
##  Estimating parameters for calibration...
## backbone: 1
## backbone: 2
## backbone: 3
## backbone: 4
##  Estimation of parameters... Completed!
##  Calibrating backbone markers (except for physical measurements)...
##  Calibration of backbone markers... Completed!
## 
## 
## 
## Forming a matrix of biology (M) for removal of batch effect...
##  Forming logicle functions...
##  Logicle transforming raw intensity...
##  Centring logicle transformed intensities...
##  Centred logicle backbone data... Obtained!
##  Deriving initial clusters with PhenoGraph (forming the M matrix)...
## Run Rphenograph starts:
##   -Input data of 250 rows and 4 columns
##   -k is set to 50
##   Finding nearest neighbors...
## DONE ~0.0040000000000191s
##  Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~0.239000000000033s
##  Build undirected graph from the weighted links...
## DONE ~0.0539999999999736s
##  Run louvain clustering on the graph ...
## DONE ~0.0149999999999864s
## Run Rphenograph DONE, totally takes 0.312000000000012s.
##   Return a community class
##   -Modularity value:0.509492508513125
##   -Number of clusters:4
## 2.5158531665802
##  UMAP with backbones (MPC)/proteins (FFC)...
## 2.99619174003601
##  Visualising clusters...
##  Completed!
## 
## 
## 
## Removal of batch effect...
##  Estimating coefficients for removing batch effect (Rfast - pre.adj)...
## Processing protein: 1
## Processing protein: 2
## Processing protein: 3
## Processing protein: 4
##  Estimation completed!
##  Removing batch effect for protein markers...
##  Adjustment completed!
##  Examining the existence of batch effect in the adjusted data (Rfast - post.adj)...
## Processing protein: 1
## Processing protein: 2
## Processing protein: 3
## Processing protein: 4
## 
## 
## 
## Cluster analysis with adjusted protein markers for ALL cells...
## Cluster analysis for normalised backbone measurements...
## Clustering with normalised backbones
## Running UMAP...
## Running Phenograph...
## Run Rphenograph starts:
##   -Input data of 250 rows and 4 columns
##   -k is set to 50
##   Finding nearest neighbors...
## DONE ~0.0029999999999859s
##  Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~0.236999999999966s
##  Build undirected graph from the weighted links...
## DONE ~0.0529999999999973s
##  Run louvain clustering on the graph ...
## DONE ~0.0160000000000196s
## Run Rphenograph DONE, totally takes 0.308999999999969s.
##   Return a community class
##   -Modularity value:0.511696350731553
##   -Number of clusters:6
##  Visualising clusters...
##  Completed!
## 
## Cell group labels are saved in "GP.denoised.bkb.allCells" column...
##  Completed!
# check the details
help(MapfxFFC, package = "MAPFX")

All the output will be stored in file.path(tempdir(), "FFCnorm_Output") (users can specify their own path: /Outpath/).

2.6 Description of the output

Three folders will be automatically generated in the output folder.
1. intermediary:
Intermediary results will be saved in the .rds or .RData formats and will be stored here.
2. downstream:
Final results will be saved in the .rds format and will be stored here. The results include normalised backbone measurements (on both linear and log scale: bkc.adj.bkb_linearScale_mt.rds and bkc.adj.bkb_logScale_mt.rds), the completed dataset with imputed infinity (exploratory, PE) markers (predictions.Rds), UMAP coordinates derived from both normalised backbones (ClusterAnalysis_umap_#bkb.rds) and the completed dataset (ClusterAnalysis_ImpuMtd_umap_#bkb.#impuPE.rds), and metadata (fcs_metadata_df.rds) for cells including cluster labels derived from both normalised backbones and the completed data matrix.
3. graph:
Figures will be stored here, including scatter plots for comparing background corrected and raw protein intensities for each protein marker, heatmaps for presenting the biological and unwanted effects in the data before and after removal of unwanted variation with mapfx.norm, boxplots (for imputations from multiple models) and a boxplot and a histogram (for imputations from a single model) of R-sq values for visualising the accuracy of imputed infinity (exploratory, PE) markers, and UMAP plots for showing the cluster structure.

2.7 Examples of the output figures

The MapfxData package (soon will be available) contains two example datasets that can be used for demonstration.

  1. MPC dataset:
    It is a subset of the single-cell murine lung data at steady state downloaded from FlowRepository (Becht et al. 2021). The raw data contains 266 .FCS files from 266 wells with 1000 cells in each file.

  2. FFC dataset:
    It contains 316,779 cells (sorted CD4+ and CD8+ T cells) from mice splenocytes that was downloaded from FlowRepository provided by Jalal Alshaweesh (Oct 2023).

2.7.1 Background Correction

This works on both MPC and FFC data.

Figure 1

The comparison of background corrected values (y-axis) and raw intensities (x-axis) for a backbone marker (left, blue) and an infinity marker (right, gold) with a 45 degree line representing x=y. Our approach aims to “calibrate” the raw protein intensities, especially the non-positive values, without distorting large values too much.

2.7.2 Removal of unwanted (well/batch) variation

This works on both MPC and FFC data.

Figure 2

A. Maximum likelihood estimates of the unwanted (left) and the biological (right) effects estimated from the pre-adjusted data. B. Maximum likelihood estimates of the unwanted (left) and the biological (right) effects estimated from the post-adjusted data using mapfx.norm. Orange represents positive effects, whereas blue indicates negative effects. The heatmaps show the existence of unwanted (well) effects and biological effects in the pre-adjusted data, and mapfx.norm managed to remove the unwanted (well) effects from the data while preserving biological variation. We can also use mapfx.norm to remove batch effect from FFC data.

2.7.3 Performance of imputation

This works on MPC data.

Figure 3

The histogram and boxplot of the R-sq values of infinity markers. Higher R-sq values represent better performance of imputation.

2.7.4 Cluster analysis

This works on both MPC and FFC data.

Figure 4

In this example, we show the results from MPC data, the UMAP two dimensional representation of cells with clusters derived from the PhenoGraph algorithm using the mapfx.norm normalised backbone data only (left) and both the normalised backbone and the imputed infinity markers (right). Clusters are better refined when we derived them using the completed data matrix (right).

3 Session Inflromation

sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## Random number generation:
##  RNG:     L'Ecuyer-CMRG 
##  Normal:  Inversion 
##  Sample:  Rejection 
##  
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] cowplot_1.1.3         gtools_3.9.5          reshape2_1.4.4       
##  [4] pbapply_1.7-2         xgboost_1.7.7.1       e1071_1.7-14         
##  [7] glmnetUtils_1.1.9     circlize_0.4.16       ComplexHeatmap_2.21.0
## [10] Rfast_2.1.0           RcppParallel_5.1.7    RcppZiggurat_0.1.6   
## [13] Rcpp_1.0.12           RColorBrewer_1.1-3    igraph_2.0.3         
## [16] iCellR_1.6.7          plotly_4.10.4         ggplot2_3.5.1        
## [19] uwot_0.2.2            Matrix_1.7-0          stringr_1.5.1        
## [22] Biobase_2.65.0        BiocGenerics_0.51.0   flowCore_2.17.0      
## [25] MAPFX_1.1.0           knitr_1.46            BiocStyle_2.33.0     
## 
## loaded via a namespace (and not attached):
##   [1] ggdendro_0.2.0       rstudioapi_0.16.0    jsonlite_1.8.8      
##   [4] shape_1.4.6.1        magrittr_2.0.3       magick_2.8.3        
##   [7] farver_2.1.1         rmarkdown_2.26       GlobalOptions_0.1.2 
##  [10] vctrs_0.6.5          Cairo_1.6-2          base64enc_0.1-3     
##  [13] rstatix_0.7.2        htmltools_0.5.8.1    progress_1.2.3      
##  [16] broom_1.0.5          Formula_1.2-5        sass_0.4.9          
##  [19] bslib_0.7.0          htmlwidgets_1.6.4    plyr_1.8.9          
##  [22] cachem_1.0.8         mime_0.12            lifecycle_1.0.4     
##  [25] iterators_1.0.14     pkgconfig_2.0.3      R6_2.5.1            
##  [28] fastmap_1.1.1        shiny_1.8.1.1        clue_0.3-65         
##  [31] digest_0.6.35        reshape_0.8.9        colorspace_2.1-0    
##  [34] S4Vectors_0.43.0     irlba_2.3.5.1        Hmisc_5.1-2         
##  [37] ggpubr_0.6.0         labeling_0.4.3       cytolib_2.17.0      
##  [40] fansi_1.0.6          httr_1.4.7           abind_1.4-5         
##  [43] compiler_4.4.0       proxy_0.4-27         withr_3.0.0         
##  [46] bit64_4.0.5          doParallel_1.0.17    htmlTable_2.4.2     
##  [49] backports_1.4.1      carData_3.0-5        ggsignif_0.6.4      
##  [52] MASS_7.3-60.2        rjson_0.2.21         scatterplot3d_0.3-44
##  [55] tools_4.4.0          foreign_0.8-86       ape_5.8             
##  [58] httpuv_1.6.15        nnet_7.3-19          glue_1.7.0          
##  [61] nlme_3.1-164         promises_1.3.0       checkmate_2.3.1     
##  [64] Rtsne_0.17           cluster_2.1.6        generics_0.1.3      
##  [67] hdf5r_1.3.10         gtable_0.3.5         class_7.3-22        
##  [70] tidyr_1.3.1          data.table_1.15.4    hms_1.1.3           
##  [73] car_3.1-2            utf8_1.2.4           RcppAnnoy_0.0.22    
##  [76] ggrepel_0.9.5        RANN_2.6.1           foreach_1.5.2       
##  [79] pillar_1.9.0         later_1.3.2          splines_4.4.0       
##  [82] dplyr_1.1.4          lattice_0.22-6       FNN_1.1.4           
##  [85] survival_3.6-4       bit_4.0.5            RProtoBufLib_2.17.0 
##  [88] tidyselect_1.2.1     gridExtra_2.3        bookdown_0.39       
##  [91] IRanges_2.39.0       stats4_4.4.0         xfun_0.43           
##  [94] matrixStats_1.3.0    pheatmap_1.0.12      stringi_1.8.3       
##  [97] lazyeval_0.2.2       yaml_2.3.8           evaluate_0.23       
## [100] codetools_0.2-20     NbClust_3.0.1        tibble_3.2.1        
## [103] BiocManager_1.30.22  cli_3.6.2            rpart_4.1.23        
## [106] xtable_1.8-4         munsell_0.5.1        jquerylib_0.1.4     
## [109] png_0.1-8            prettyunits_1.2.0    glmnet_4.1-8        
## [112] viridisLite_0.4.2    scales_1.3.0         purrr_1.0.2         
## [115] crayon_1.5.2         GetoptLong_1.0.5     rlang_1.1.3

References

Becht, Etienne, Daniel Tolstrup, Charles-Antoine Dutertre, Peter A. Morawski, Daniel J. Campbell, Florent Ginhoux, Evan W. Newell, Raphael Gottardo, and Mark B. Headley. 2021. “High-Throughput Single-Cell Quantification of Hundreds of Proteins Using Conventional Flow Cytometry and Machine Learning.” Science Advances 7 (39): eabg0505. https://doi.org/10.1126/sciadv.abg0505.

Liao, Hsiao-Chi, Terence P. Speed, Davis J. McCarthy, and Agus Salim. 2024. “MAssively-Parallel Flow Cytometry Xplorer (Mapfx): A Toolbox for Analysing Data from the Massively-Parallel Cytometry Experiments.” bioRxiv. https://doi.org/10.1101/2024.02.28.582452.