# flowAI

## Introduction to flowAI

The flowAI package allows to perform quality control on flow cytometry data in order to warrant superior results for both manual and automated downstream analysis. The package is built on the functions: 1) flow_auto_qc, for the automatic analysis and 2) flow_iQC(), for the interactive analysis. The principle behind these two methods is complementary and takes into account three different properties, i.e. 1) flow rate, 2) signal acquisition and 3) dynamic range. The evaluation of these properties makes it possible to remove the technical variability derived from surges or deviations in the flow rate, defective laser-detection system, data range limitations and other technical issues.

After the installation of the package in your local system you can load the package.

require(flowAI)

#### Data

For this documentation or for other testing purposes we use a small built-in dataset. The dataset was manually created extracting a subsample of cells and channels from three FCS files that were part of an aging study of a Singaporean cohort. The dataset is stored as a flowSet object.

data(Bcells)
Bcells
## A flowSet with 3 experiments.
##
##   column names:
##   FSC-A FSC-H SSC-A SSC-H APC-Cy7-A Pacific Blue-A AmCyan-A Qdot 605-A Qdot 655-A PE-A PE-Texas Red-A PE-Cy7-A Time

To select FCS files from your working directory you can create a character vector of the files you want to analyze calling the function dir with “*fcs$" as regular expression for the pattern argument. setwd(...) fcsfiles <- dir(".", pattern="*fcs$")

#### Automatic method

The automatic method is implemented in the function flow_auto_qc. The following calls show how to perform the quality control with default settings on the FCS files in your folder and in the toy dataset that comes with the FlowAI packages. The flowAI package depends on the flowCore package for the handling of the FCS files in the R environment. The flowCore package provides two main classes, flowFrame and flowSet. The Bcells object is an instance of the flowSet class and contains a set of three FCS files that taken singuarly are instances of the flowFrame object. The flow_auto_qc function can be called on either one of the flowCore objects, flowSet and flowFrame, and on a character vector of the fcs files:

resQC <- flow_auto_qc(Bcells)  # using a flowSet
resQC <- flow_auto_qc(Bcells[[1]]) # using a flowFrame
resQC <- flow_auto_qc(fcsfiles) # using a character vector

When a character vector is used to call the flow_auto_qc function, a flowSet object is automatically generated since the creation of the histogram for the cell number comparison depends on it. Therefore, to avoid memory saturation, we suggest to split large datasets in batches that are compatible with the hardware specifications of your computer system. For example, if you want batches of maximum 2 gigabytes you can use:

GbLimit <- 2    # decide the limit in gigabyte for your batches of FCS files
size_fcs <- file.size(fcsfiles)/1024/1024/1024    # it calculates the size in gigabytes for each FCS file
groups <- ceiling(sum(size_fcs)/GbLimit)
cums <- cumsum(size_fcs)
batches <- cut(cums, groups) 

Then you can run your analysis on the batches using a for-loop:

for(i in 1:groups){
flow_auto_qc(fcsfiles[which(batches == levels(batches)[i])], output = 0)
}

When setting the output argument to 0 (or any other value apart 1 and 2), no R objects are returned.

#### Interactive method

The interactive method is implemented as a Shiny app and is executed through the flow_iQC() command on the R environment. For performance and clearness reasons, it allows to analyze one file at a time only. Once you open the Shiny app on your web browser, you can upload the FCS file from the top part of the left hand side panel.

## Quality control

The full pipeline of our quality control procedure includes the removal of events having anomalous values when looking at three aspect of a flow cytometry analysis:

1. flow rate
2. signal acquisition
3. dynamic range

However, it is possible to perform partial quality control on only one or two of the above mentioned properties. For the automatic method, this can be set with the argument remove_from. After the quality control, the automatic method generates by default a new FCS file containing an additional parameter where the low quality events have a value higher than 10,000, similarly to the flowClean flagging method. Alternatively, a new FCS containing only the high quality events can be generated. Moreover, flowAI can be implemented in automatic pipelines of analysis through the returned objects of the flowFrame or flowSet class

## Result evaluation

The function flow_auto_qc generates a report for each FCS file, in both a graphic and tabular format, to evaluate the performance of the algorithms in the detection of the anomalies.
We suggest to run the automatic method first with default settings. If the results are not satisfying you can either modify the settings or use the interactive method flow_iQC.

## Case study: B cells from elderly individuals

Here, we give an example of the results obtained after performing the quality control on the first FCS file of the Bcells dataset.

#### File description and quality control summary

The summary information of the FCS file analyzed is reported in the first section of the automatically generated report or on the left hand side panel of the flow_iQC Shiny app. The summary information contains the name of the file, the number of events and the total percentage of anomalies detected and removed.

The following information were obtained from the automatically generated report of our example:

Input File Name: Bcells1
Number of Events: 64562
The anomalies were removed from: Flow Rate, Flow Signal and Flow Margin Anomalies Detected in total: 23% Number of high quality events: 49535

##### Comparison of the number of events among the FCS files of the dataset

If the dataset has more than three FCS files, the automatic method will produce a histogram containing the number of events for each file. The bar in blue correspond to the FCS file whose quality control analysis is described in the remaining part of the report.

#### Flow rate check

The flow rate is reconstructed using the keyword \$TIMESTEP contained in FCS files with version equal or greater to 3. By default the analysis is performed using a timestep of 1/10 of a second. flow_auto_qc uses an anomaly detection algorithm to detect and remove the data acquired during flow rate surges and shift from the median value. The algorithm is based on the Generalized ESD outlier detection method optimized to work on time series data. The anomalies automatically detected are circled in green.

flow_iQC allows to manually select the most stable region of the flow rate.

#### Signal acquisition check

For each channel, the median of the signal of equally-sized bins of events is reported as a Levy-Jennings-type graph. The mean and standard deviation of the median should remain constant over the course of the analysis. flow_auto_qc uses a changepoint detection method to verify the stability of the signal. Precisely, a shift in the median or the variance is detected by the Binary Segmentation algorithm of the changepoint package. In the resulting plot, the region that passed the quality control is highlighted in yellow.

As for the flow rate checking, flow_iQC allows to manually choose the most stable region.

#### Dynamic range check

Events from the upper and lower limits of the dynamic range are checked in the last step. For the upper limit, the maximum value of the dynamic range is removed since the instrument is unable to record values exceeding a maximum pre-set by the manufacturer. For the lower limit, the quality control removes all the values below zero for the scatter channels and all the outliers in the negative range for the immunofluorescence channels. The plot shows the frequency of events removed over the course of the analysis; the scaling of the x-axis is complementary to the one of the signal acquisition check. For this step, both flow_auto_qc and flow_iQC use the same detection principle to scout for anomalies. When using the automatic method to refine the lower limit of the dynamic range, with the neg_valueFM argument you can decide to truncate the negative values to the cut-off suggested in the FCS file instead of the removing the negative outliers.