1 Context

Flow injection analysis (FIA) is becoming more and more used in the context of high-throughput profiling, because of an increased resolution of mass spectrometers (HRMS). The data produced however are complex and affected by matrix effect which makes their processing difficult. The proFIA bioconductor package provides the first workflow to process FIA-HRMS raw data and generate the peak table. By taking into account the high resolution and the information of matrix effect available from multiple scans, the algorithms are robust and provide maximum information about ions m/z and intensitie using the full capability of modern mass spectrometers.

2 Structure

The first part of this vignette give a quick overview of the proFIA main workflow and the second part discuss the important parameters and gives some hint about parameters tuning using the plot offered by proFIA

3 Workflow

proFIA workflow

proFIA workflow

The first step generates the proFIAset object, which will be further processed during the workflow. The object contains initial information about the sample and the classes (when subdirectories for the raw data are present), as well as all results froom the processing (e.g., detected peaks, grouping, etc.). At each step, the data quality can be checked by a graphical overview using the plot function. For convenience, the 3 processing functions and methods from the workflow (proFIAset, group.FIA, and impute.FIA) have been wrapped into a single analyzeAcquisitionFIA function. The final dataMatrix can be exported, as well as the 2 supplementary tables containing the sampleMetadata and the variableMetadata.

proFIA can also be accessed via a graphical user interface in the proFIA module from the online resource for computational metabolomics, which provides a user-friendly, Galaxy-based environment for data pre-processing, statistical analysis, and annotation (Giacomoni et al. 2015).

4 The plasFIA data package

A real data set consisting of human plasma spiked with 40 molecules at 3 increasing concentrations was acquired on an Orbitrap mass spectrometer with 2 replicates, in the positive ionization mode (U. Hohenester and C. Junot, LEMM laboratory, CEA, MetaboHUB). The 10 files are available in the plasFIA bioconductor data package, in the mzML format (centroid mode).

5 Hands-on

5.1 Peak detection with proFIAset

We first load the two packages containing the software and the dataset:

# loading the packages
# finding the directory of the raw files
path <- system.file(package="plasFIA", "mzML")
## [1] "C100A.mzML" "C100B.mzML" "C10A.mzML"  "C10B.mzML"  "C1A.mzML"  
## [6] "C1B.mzML"

The first step of the workflow is the proFIAset function which takes as input the path to the raw files. This function performs noise model building, followed by m/z strips detection and filtering. The important parameters to keep in mind are:

  • noiseEstimation (logical): shall noise model be constructed to filter signal? (recommended).

  • ppm and dmz (numeric): maximum deviation between scans during strips detection in ppm. If the deviation in absolute in mz is lower than dmz, dmz is taken over ppm to account for low masses bias. More information about the tuning of this parameters is given in the Tuning proFIA parameters section

  • parallel (logical): shall parallel computation be used. You can define which sort of parallelism you want to use using the BioCParallel package.

Note: As all files need to be processed 2 times, one for noise estimation and one for model estimation, this step is the most time consuming of the workflow.

# defining the ppm parameter adapted to the Orbitrap Fusion
ppm <- 2

# performing the first step of the workflow
plasSet <- proFIAset(path, ppm=ppm, parallel=FALSE)

The quality of peak detection can be assessed by using the plotRaw method to visualize the corresponding areas in the raw data.

# loading the spiked molecules data frame

# plotting the raw region aroung the Diphenhydramine mass signal
##    formula           names classes     mass mass_M+H
## 7 C17H21NO Diphenhydramine Benzene 255.1623 256.1696
mzrange <- c(plasMols[7,"mass_M+H"]-0.1,plasMols[7,"mass_M+H"]+0.1)
plotRaw(plasSet, type="r", sample=3, ylim=mzrange, size=0.6)
## Create profile matrix with method 'bin' and step 1 ... OK

In the example above, we see that a signal at 256.195 m/z corresponding to the solvent has been correctly discarded by proFIA.

# plotting the filter Dipehnhydramine region.
plotRaw(plasSet, type="p", sample=3, ylim=mzrange, size=0.6)
## Create profile matrix with method 'bin' and step 1 ... OK

Peak detection in proFIA is based on matched filtering. It therefore relies on a peak model which is tuned on the signals from the most intense ions. The plotModelFlowgrams method allows to check visually the consistency of these reconstructed filters.

# plotting the injection peak