This vignette describes the use of the MSnbase package for centroiding of profile-mode mass spectrometry data.
Mass spectrometry measures data in so called profile mode, were the signal corresponding to a specific ion is distributed around the ion’s actual m/z value (Smith et al. 2014). The accuracy of that signal depends on the resolution and settings of the instrument. Profile mode data can be processed into centroid data by retaining only a single, representative value, typically the local maximum of the distribution of data points. This centroiding substantially reduces the amount of data without much loss of information. Certain algorithms, such as the centWave method in the xcms package for chromatographic peak detection in LC-MS experiments or proteomics search engines that match MS2 spectra to peptides, require the data to be in centroid mode. In this vignette, we will focus on metabolomics data.
Many manufacturers apply centroiding of the profile data, either
directly during the acquisition or immediately thereafter so that the
user immediately receives processed data. Alternatively, third party
software, such as
msconvert from the
(Chambers et al. 2012) allow to apply various data centroiding algorithms,
including vendor methods. In some cases however, the software provided
by some vendors generate centroided data of poor quality.
also provides some functionality to perform centroiding of profile MS
data. These processed data can then be further quantified or analysed
within R or serialised to mzML files, and used as input for other
In this vignette we use a subset of a metabolomics profile-mode LC-MS
data of pooled human serum samples measured on a AB Sciex TripleTOF
5600+ mass spectrometer (the employed chromatography was a hydrophilic
interaction high-performance liquid chromatography (HILIC HPLC)). The
mzML file contains profile mode data for an m/z range from 105 to 130
and a retention time from 0 to 240 seconds. For more details on the
?msdata::sciexdata. Below we load the required packages
and read the MS data.
library("MSnbase") library("msdata") library("magrittr") fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE) basename(fl)
##  "20171016_POOL_POS_3_105-134.mzML"
data_prof <- readMSData(fl, mode = "onDisk", centroided = FALSE)
We next extract the profile MS data for the [M+H]+ adduct of serine
with the expected m/z of 106.049871. We thus filter the
object using an m/z range containing the signal for the metabolite and
a retention time window from 175 to 187 seconds corresponding to the
time when the analyte elutes from the LC.
## Define the mz and retention time ranges serine_mz <- 106.049871 mzr <- c(serine_mz - 0.01, serine_mz + 0.01) rtr <- c(175, 187) ## Filtering the object serine <- data_prof %>% filterRt(rtr) %>% filterMz(mzr)
We can now plot the profile MS data for serine.
plot(serine, type = "XIC") abline(h = serine_mz, col = "red", lty = 2)
The lower panel in the plot above shows all the individual signal intensities measured by the mass spectrometer over the retention time and the m/z ranges of interest. The upper panel displays the base peak chromatogram (BPC), which represents the maximum signal (across the range of m/z values) for each discrete retention time. The rows of points in this lower panel indicate the resolution of the mass spectrometer while the columns of data points (i.e. the data collected for a discrete retention time point) represents the signal for the ion in one spectrum.
Below we plot the signal for one of of the 43 spectra containing signal for serine, the one at retention time 181.07