1 Introduction

This document describes how to use xcms for the analysis of direct injection mass spec data, including peak detection, calibration and correspondence (grouping of peaks across samples).

2 Peak detection

Prior to any other analysis step, peaks have to be identified in the mass spec data. In contrast to the typical metabolomics workflow, in which peaks are identified in the chromatographic (time) dimension, in direct injection mass spec data sets peaks are identified in the m/z dimension. xcms uses functionality from the MassSpecWavelet package to identify such peaks.

Below we load the required packages. We disable parallel processing. To enable and customize parallel processing please see the BiocParallel vignette.

library(xcms)
library(MassSpecWavelet)

register(SerialParam())

In this documentation we use an example data set from the msdata package. Assuming that msdata is installed, we locate the path of the package and load the data set. We create also a data.frame describing the experimental setup based on the file names.

mzdata_path <- system.file("fticr", package = "msdata")
mzdata_files <- list.files(mzdata_path, recursive = TRUE, full.names = TRUE)

## Create a data.frame assigning samples to sample groups, i.e. ham4 and ham5.
grp <- rep("ham4", length(mzdata_files))
grp[grep(basename(mzdata_files), pattern = "^HAM005")] <- "ham5"
pd <- data.frame(filename = basename(mzdata_files), sample_group = grp)

## Load the data.
ham_raw <- readMSData(files = mzdata_files,
                      pdata = new("NAnnotatedDataFrame", pd),
                      mode = "onDisk")

The data files are from direct injection mass spectrometry experiments, i.e. we have only a single spectrum available for each sample and no retention times.

## Only a single spectrum with an *artificial* retention time is available
## for each sample
rtime(ham_raw)
## F01.S1 F02.S1 F03.S1 F04.S1 F05.S1 F06.S1 F07.S1 F08.S1 F09.S1 F10.S1 
##      1      1      1      1      1      1      1      1      1      1

Peaks are identified within each spectrum using the mass spec wavelet method.

## Define the parameters for the peak detection
msw <- MSWParam(scales = c(1, 4, 9), nearbyPeak = TRUE, winSize.noise = 500,
                SNR.method = "data.mean", snthresh = 10)

ham_prep <- findChromPeaks(ham_raw, param = msw)

head(chromPeaks(ham_prep))
##            mz    mzmin    mzmax rt rtmin rtmax    into     maxo       sn
## [1,] 403.2367 403.2279 403.2447 -1    -1    -1 4735258 372259.4 22.97534
## [2,] 409.1845 409.1747 409.1936 -1    -1    -1 4158404 310572.1 20.61382
## [3,] 413.2677 413.2585 413.2769 -1    -1    -1 6099006 435462.6 27.21723
## [4,] 423.2363 423.2266 423.2459 -1    -1    -1 2708391 174252.7 14.74527
## [5,] 427.2681 427.2574 427.2779 -1    -1    -1 6302089 461385.6 32.50050
## [6,] 437.2375 437.2254 437.2488 -1    -1    -1 7523070 517917.6 34.37645
##      intf      maxf sample is_filled
## [1,]   NA  814693.1      1         0
## [2,]   NA  732119.9      1         0
## [3,]   NA 1018994.8      1         0
## [4,]   NA  435858.5      1         0
## [5,]   NA 1125644.3      1         0
## [6,]   NA 1282906.5      1         0

3 Calibration

The calibrate method can be used to correct the m/z values of identified peaks. The currently implemented method requires identified peaks and a list of m/z values for known calibrants. The identified peaks m/z values are then adjusted based on the differences between the calibrants’ m/z values and the m/z values of the closest peaks (within a user defined permitted maximal distance). Note that this method does presently only calibrate identified peaks, but not the original m/z values in the spectra.

Below we demonstrate the calibrate method on one of the data files with artificially defined calibration m/z values. We first subset the data set to the first data file, extract the m/z values of 3 peaks and modify the values slightly.

## Subset to the first file.
first_file <- filterFile(ham_prep, file = 1)

## Extract 3 m/z values
calib_mz <- chromPeaks(first_file)[c(1, 4, 7), "mz"]
calib_mz <- calib_mz + 0.00001 * runif(1, 0, 0.4) * calib_mz + 0.0001

Next we calibrate the data set using the previously defined artificial calibrants. We are using the "edgeshift" method for calibration that adjusts all peaks within the range of the m/z values of the calibrants using a linear interpolation and shifts all chromatographic peaks outside of that range by a constant factor (the difference between the lowest respectively largest calibrant m/z with the closest peak’s m/z). Note that in a real use case, the m/z values would obviously represent known m/z of calibrants and would not be defined on the actual data.

## Set-up the parameter class for the calibration
prm <- CalibrantMassParam(mz = calib_mz, method = "edgeshift",
                          mzabs = 0.0001, mzppm = 5)
first_file_calibrated <- calibrate(first_file, param = prm)

To evaluate the calibration we plot below the difference between the adjusted and raw m/z values (y-axis) against the raw m/z values.

diffs <- chromPeaks(first_file_calibrated)[, "mz"] -
    chromPeaks(first_file)[, "mz"]

plot(x = chromPeaks(first_file)[, "mz"], xlab = expression(m/z[raw]),
     y = diffs, ylab = expression(m/z[calibrated] - m/z[raw]))