Package: xcms
Authors: Johannes Rainer
Modified: 2018-04-30 13:21:36
Compiled: Mon Apr 30 18:47:52 2018

1 Introduction

This documents describes data import, exploration, preprocessing and analysis of LCMS experiments with xcms version >= 3. The examples and basic workflow was adapted from the original LC/MS Preprocessing and Analysis with xcms vignette from Colin A. Smith.

The new user interface and methods use the XCMSnExp object (instead of the old xcmsSet object) as a container for the pre-processing results. To support packages and pipelines relying on the xcmsSet object, it is however possible to convert an XCMSnExp into a xcmsSet object using the as method (i.e. xset <- as(x, "xcmsSet"), with x being an XCMSnExp object.

2 Data import

xcms supports analysis of LC/MS data from files in (AIA/ANDI) NetCDF, mzML/mzXML and mzData format. For the actual data import Bioconductor’s SRC_R[:exports both]{Biocpkg(“mzR”)} is used. For demonstration purpose we will analyze a subset of the data from [1] in which the metabolic consequences of knocking out the fatty acid amide hydrolase (FAAH) gene in mice was investigated. The raw data files (in NetCDF format) are provided with the faahKO data package. The data set consists of samples from the spinal cords of 6 knock-out and 6 wild-type mice. Each file contains data in centroid mode acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds.

Below we load all required packages, locate the raw CDF files within the faahKO package and build a phenodata data frame describing the experimental setup.

library(xcms)
library(faahKO)
library(RColorBrewer)
library(pander)
library(magrittr)

## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
        recursive = TRUE)
## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
                   replacement = "", fixed = TRUE),
         sample_group = c(rep("KO", 6), rep("WT", 6)),
         stringsAsFactors = FALSE) 

Subsequently we load the raw data as an OnDiskMSnExp object using the readMSData method from the MSnbase package. While the MSnbase package was originally developed for proteomics data processing, many of its functionality, including raw data import and data representation, can be shared and reused in metabolomics data analysis. Also, MSnbase can be used to centroid profile-mode MS data (see the corresponding vignette in the MSnbase package).

raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
               mode = "onDisk") 

The OnDiskMSnExp object contains general information about the number of spectra, retention times, the measured total ion current etc, but does not contain the full raw data (i.e. the m/z and intensity values from each measured spectrum). Its memory footprint is thus rather small making it an ideal object to represent large metabolomics experiments while still allowing to perform simple quality controls, data inspection and exploration as well as data sub-setting operations. The m/z and intensity values are imported from the raw data files on demand, hence the location of the raw data files should not be changed after initial data import.

3 Initial data inspection

The OnDiskMSnExp organizes the MS data by spectrum and provides the methods intensity, mz and rtime to access the raw data from the files (the measured intensity values, the corresponding m/z and retention time values). In addition, the spectra method could be used to return all data encapsulated in Spectrum classes. Below we extract the retention time values from the object.

head(rtime(raw_data)) 
## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##  2501.378  2502.943  2504.508  2506.073  2507.638  2509.203

All data is returned as one-dimensional vectors (a numeric vector for rtime and a list of numeric vectors for mz and intensity, each containing the values from one spectrum), even if the experiment consists of multiple files/samples. The fromFile function returns a numeric vector that provides the mapping of the values to the originating file. Below we use the fromFile indices to organize the mz values by file.

mzs <- mz(raw_data)

## Split the list by file
mzs_by_file <- split(mzs, f = fromFile(raw_data))

length(mzs_by_file) 
## [1] 12

As a first evaluation of the data we plot below the base peak chromatogram (BPC) for each file in our experiment. We use the chromatogram method and set the aggregationFun to "max" to return for each spectrum the maximal intensity and hence create the BPC from the raw data. To create a total ion chromatogram we could set aggregationFun to sum.

## Get the base peak chromatograms. This reads data from the files.
bpis <- chromatogram(raw_data, aggregationFun = "max")
## Define colors for the two groups
group_colors <- brewer.pal(3, "Set1")[1:2]
names(group_colors) <- c("KO", "WT")

## Plot all chromatograms.
plot(bpis, col = group_colors[raw_data$sample_group])