artMS is a Bioconductor package that provides a set of tools for the analysis and integration of large-scale proteomics (mass-spectrometry-based) datasets obtained using the popular proteomics software MaxQuant.
The functions available in artMS can be grouped into the following categories:
Before you begin, ensure that your system is running an
R version >= 3.6
or the installation of
artMS won’t work.
You can check the R version running on your system
by executing the function
If the outcome is
>= 3.6.0, congratulations! you can move forward.
If it is not, then you need to
install the latest version of R in your system.
Two options to install artMS:
(Why Bioconductor? Here you can find a nice summary of many good reasons).
R (>= 3.6)version running on your system, follow these steps:
install.packages("devtools") library(devtools) install_github("biodavidjm/artMS")
Once installed, the package can be loaded and attached to your current workspace as follows:
artMS performs the different analyses taking as input the following files:
Check below to find out more about generating the input files.
artmsQuantification() requires a large number of arguments, specially those
related to the statistical package MSstats.
To facilite the task of providing all those arguments, the function
artmsQuantification() takes a config file (in
yaml format) for the
customization of the parameters for quantification (using
and other operations, including QC analyses, charts, and annotations.
A configuration file template can be generated by running
Check below to learn the details of the configuration file.
Generate the input files: Check the input files section for details
Quality Control: if you are interested in performing only quality control analysis, run the following functions:
artmsQualityControlEvidenceBasic(): QC based on the
artmsQualityControlEvidenceExtended(): based on the
artmsQualityControlSummaryExtended(): based on the
Relative Quantification: fill up the configuration file and run the following function:
artmsQuantification(yaml_config_file = "config.yaml")(here the details)
Analysis of Quantifications: performs annotations, clustering analysis, PCA analysis, enrichment analysis by running the function
artmsAnalysisQuantifications()(here the details)
Miscellaneous functions: Check below to discover more useful functions
provided by the
artMS also enables the relative quantification of untargeted
polar metabolites using the alignment table generated by MarkerView.
This means that the metabolites do not need to have an
ID, as the
retention time will be used as identifiers. Typical workflow:
Run QC on the metabolomics dataset:
artmsQuantification() (notice that a few options
must be changed in the config file before running the function)
Please, keep in mind that most of the functions won’t work for metabolomics data due to annotation issues (protein/gene ids are the primary ids for most of the functions). Check the metabolomics section to find out more.
Three basic (tab-delimited) files are required to perform the full pack of operations:
The output of the quantitative proteomics software package MaxQuant. It combines all the information about the identified peptides.
Tab delimited file generated by the user. It summarizes the experimental
design of the evidence file.
artMS merges the
by the “RawFile” column. Each RawFile corresponds to a unique individual
experimental technical replicate / biological replicate / Condition / Run.
For any basic label-free proteomics experiment, the keys file must contain the following columns and rules:
'L'for label free experiments (
'H'will be used for SILAC experiments, see below)
Conditionname, and add as suffix
dash (-)plus the biological replicate number. For example, if condition
H1N1_06Hhas too biological replicates, name them
Example of keys file: check the artMS data object
Tip: it is recommended to use Microsoft Excel (OpenOffice Cal / or similar) to generate the keys file. Do not forget to choose the format = Tab Delimited Text (.txt) when saving the file (use save as option)
The comparisons between conditions that the user wants to quantify.
WT_A549) relative to two additional experimental conditions with drugs (
WT_DRUG_B), but also changes in protein abundance between
DRUG_B, the contrast file would look like this:
WT_DRUG_A-WT_A549 WT_DRUG_B-WT_A549 WT_DRUG_A-WT_DRUG_B
-), and only one dash symbol is allowed, i.e., only one comparison per line.
As a result of the quantification, the condition on the left will take the positive log2FC sign -if the protein is more abundant in condition on the left (numerator), and the condition on the right the negative log2FC -if a protein is more abundant in condition on the right term (denominator).
Example of wrong comparisons
Only condition names are allowed. Individual Bioreplicates cannot be compared. For example, this is wrong:
The configuration file (in
yaml format) contains a variety of options
available for the QC, quantification, and annotations performed by
To generate a sample configuration file, go to the project folder
setwd(/path/to/your/working/folder/)) and execute:
## Registered S3 method overwritten by 'xts': ## method from ## as.zoo.xts zoo
artmsWriteConfigYamlFile(config_file_name = "config.yaml", verbose = FALSE)
config.yaml file with your favorite editor (RStudio for example).
Although it might look complex, the default options work very well.
The configuration (
yaml) file contains the following sections:
files : evidence : /path/to/the/evidence.txt keys : /path/to/the/keys.txt contrasts : /path/to/the/contrast.txt summary: /path/to/the/summary.txt # Optional output : /path/to/the/results_folder/ph-results.txt
path/name of the required files. It is recommended to create
a new folder in your folder project (for example,
The results file name (e.g.
-results.txt) will be used as prefix for the
several files (
qc: basic: 1 # 1 = yes; 0 = no extended: 1 # 1 = yes; 0 = no extendedSummary: 0 # 1 = yes; 0 = no
Select to perform both ‘basic’ and ‘extended’ quality control based on the
evidence.txt file or ‘extendedSummary’ based on the
to find out more about the details of each type of analysis.
data: enabled : 1 # 1 = yes; 0 = no fractions: enabled : 0 # 1 for protein fractionation silac: enabled : 0 # 1 for SILAC experiments filters: enabled : 1 contaminants : 1 protein_groups : remove # remove, keep modifications : AB # PH, UB, AB, APMS sample_plots : 1 # correlation plots
Let’s break it down
enabled : 1: to pre-process the data provided in the files section.
0: won’t process the data (and a pre-generated MSstats file will be expected)
fractions: Multiple fractionation or separation methods are often combined in proteomics to improve signal-to-noise and proteome coverage and to reduce interference between peptides in quantitative proteomics.
enabled : 1for fractionation dataset. See Special case: Protein Fractionation below for details
enabled : 0no fractions
enabled : 1: check if the files belong to a SILAC experiment. See Special case: SILAC below for details
enabled : 0: no silac experiment (default)
enabled : 1Enables filtering (this section)
contaminants : 1Removes contaminants (
REV__labeled by MaxQuant)
protein_groups : removechoose whether
modifications : ABany of the proteomics experiments,
ACfor posttranslational modifications,
1Generate correlation plots
msstats : enabled : 1 msstats_input : # `-mss.txt` file or blank (default) profilePlots : none normalization_method : equalizeMedians normalization_reference : # blank (default) if equalizeMedians summaryMethod : TMP censoredInt : NA cutoffCensored : minFeature MBimpute : 1 feature_subset: all
Let’s break it down:
1to run MSstats,
msstats_input :leave it blank if MSstats will be run (previous
enabled : 1). But if MSstats was already run and the
evidence-mss.txtfile is available, then choose
enabled : 0and provide here the
profilePlots :Choose one of the following options:
beforeplot before normalization
afterplot after normalization
before-after: recommended, although computational expensive
noneno normalization plots
normalization_method :available options:
0: no normalization (not recommended)
globalStandardsif selected, specified the reference protein in
normalization_reference :UniProt id if
globalStandardsis chosen as the
summaryMethod :TMP # “TMP”(default) means Tukey’s median polish, which is robust estimation method. “linear” uses linear mixed model. “logOfSum” conducts log2 (sum of intensities) per run.
NA(default) Missing values are censored or at random. ‘NA’ assumes that all ‘NA’s in ’Intensity’ column are censored.
0uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use
0. Null assumes that all
NAintensities are randomly missing.
minFeatureCutoff value for censoring. Only with
censoredInt : NAor
0. Default is ‘minFeature’, which uses minimum value for each feature.
minFeatureNRunuses the smallest between minimum value of corresponding feature and minimum value of corresponding run.
minRunuses minimum value for each run.
0. TRUE (default) imputes ‘NA’ or ‘0’ (depending on censoredInt option) by Accelerated failure model.
FALSEuses the values assigned by cutoffCensored.
highQuality: this option seems to be buggy right now
Check MSstats documentation to find out more about every option.
enabled : 1 # if 0, won't process anything on this section annotate : enabled: 1 species : HUMAN plots: volcano: 1 heatmap: 1 LFC : -1.5 1.5 # Range of minimal log2fc FDR : 0.05 heatmap_cluster_cols : 0 heatmap_display : log2FC # log2FC or pvalue
Extra actions to perform based on the MSstats results, including annotations and plots (heatmaps and volcano plots). Let’s break it down:
enabled :1 (default) enables this section, 0 turns it off
enabled: 1 (default), will generate a
-results-annotated.txtfile that includes
Protein.Name(only for supported species)
species: The supported species are: HUMAN, MOUSE, ANOPHELES, ARABIDOPSIS, BOVINE, WORM, CANINE, FLY, ZEBRAFISH, ECOLI_STRAIN_K12, ECOLI_STRAIN_SAKAI, CHICKEN, RHESUS, MALARIA, CHIMP, RAT, YEAST, PIG, XENOPUS
plots :options for additional plots
LFC :log2 fold change cutoff (minimal negative and positive value)
FDR :false discovery rate cutoff for significance (recommended: 0.05)
heatmap :correlation plots
heatmap_cluster_cols :1 perfoms clustering of columns, 0 (default) doesn’t
heatmap_display :choose to display either
To handle protein fractionation experiments, two options must be activated
keys.txt: The keys file must contain an additional column named “
FractionKey” with the information about fractions. For example:
config.yaml: Enable fractions in the configuration file as follow:
fractions: enabled : 1 # 1 for protein fractions, 0 otherwise
One of the most widely used techniques that enable relative protein
quantification is stable isotope labeling by amino acids in cell culture
keys.txt file can capture the typical SILAC experiment.
The following example shows a SILAC experiment with two conditions,
two biological replicates, and two technical replicates:
It is also required to activate the silac option in the yaml file as follows:
silac: enabled : 1 # 1 for SILAC experiments
artMS provides 3 functions to perform QC analyses.
The basic quality control analysis takes as input both the
and keys.txt files
and generates several QC plots exploring different aspects of
the MS data. Run it as follows:
artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH")
REVreversed sequences used by MaxQuant to estimate the FDR); Box plots of MS Intensity values per biological replicates and conditions; bar plots of total intensity (excluding contaminants) by bioreplicates and conditions; Bar plots of total feature counts by bioreplicates and conditions.
AC) an extra pdf file will be generated with stats related to the selected modification, including: bar plot of peptide counts and intensities, broken by
PTM/othercategories; bar plots of total sum-up of MS intensity values by other/PTM categories.
?artmsQualityControlEvidenceBasic() to find out more options
about this function.
Next, for illustration purposes, let’s show how to generate only one plot (e.g. INTDIST):
# But for illustration purposes printing only INTDIST plot: library(artMS) suppressWarnings( artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH", plotINTDIST = TRUE, plotREPRO = FALSE, plotCORMAT = FALSE, plotINTMISC = FALSE, plotPTMSTATS = FALSE, printPDF = FALSE, verbose = FALSE))