1 Load PRONE Package

# Load and attach PRONE
library(PRONE)

2 Load Data (TMT)

Here, we are directly working with the SummarizedExperiment data. For more information on how to create the SummarizedExperiment from a proteomics data set, please refer to the “Get Started” vignette.

The example TMT data set originates from (Biadglegne et al. 2022).

data("tuberculosis_TMT_se")
se <- tuberculosis_TMT_se

This SummarizedExperiment object already includes data of different normalization methods. Since this vignette should show you how to use the PRONE workflow for novel proteomics data, we will remove the normalized data and only keep the raw and log2 data that are available after loading the data accordingly.

se <- subset_SE_by_norm(se, ain = c("raw", "log2"))

3 Overview of the Data

To get an overview on the number of NAs, you can simply use the function get_NA_overview():

knitr::kable(get_NA_overview(se, ain = "log2"))
Total.Values NA.Values NA.Percentage
6020 1945 32.30897

To get an overview on the number of samples per sample group or batch, you can simply use the function plot_condition_overview() by specifying the column of the meta-data that should be used for coloring. By default (condition = NULL), the column specified in load_data() will be used.

plot_condition_overview(se, condition = NULL)
#> Condition of SummarizedExperiment used!
Overview barplot of the number of samples per condition.

Figure 3.1: Overview barplot of the number of samples per condition.

plot_condition_overview(se, condition = "Pool")
Overview barplot of the number of samples per pool.

Figure 3.2: Overview barplot of the number of samples per pool.

A general overview of the protein intensities across the different samples is provided by the function plot_heatmap(). The parameter “ain” specifies the data to plot, currently only “raw” and “log2” is available (names(assays(se)). Later if multiple normalization methods are executed, these will be saved as assays, and the normalized data can be visualized.

available_ains <- names(assays(se))

plot_heatmap(se, ain = "log2", color_by = c("Pool", "Group"), 
             label_by = NULL, only_refs = FALSE)
#> Label of SummarizedExperiment used!
#> $log2
Heatmap of the log2-protein intensities with columns and proteins being clustered with hclust.

Figure 3.3: Heatmap of the log2-protein intensities with columns and proteins being clustered with hclust.

Similarly, an upset plot can be generated to visualize the overlaps between sets defined by a specific column in the metadata. The sets are generated by using non-NA values.

plot_upset(se, color_by = NULL, label_by = NULL, mb.ratio = c(0.7,0.3), 
           only_refs = FALSE)
#> Condition of SummarizedExperiment used!
#> Label of SummarizedExperiment used!
Upset plot of the non-NA protein intensities with sets defined by the Pool column.

Figure 3.4: Upset plot of the non-NA protein intensities with sets defined by the Pool column.

If you are interested in the intensities of specific biomarkers, you can use the plot_markers_boxplots() function to compare the distribution of intensities per group. The plot can be generated per marker and facet by normalization method (facet_norm = TRUE) or by normalization method and facet by marker (facet_marker = TRUE).

p <- plot_markers_boxplots(se, 
                           markers = c("Q92954;J3KP74;E9PLR3", "Q9Y6Z7"), 
                           ain = "log2", 
                           id_column = "Protein.IDs", 
                           facet_norm = FALSE, 
                           facet_marker = TRUE)
#> Condition of SummarizedExperiment used!
#> No shaping done.
p[[1]] + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5))