Contents

Compiled date: 2024-04-21

Last edited: 2024-01-21

License: GPL-3

1 Installation

To install the Bioconductor version of the POMA package, run the following code:

# install.packages("BiocManager")
BiocManager::install("POMA")

2 Load POMA

library(POMA)
library(ggtext)
library(magrittr)

3 The POMA Workflow

The POMA package functions are organized into three sequential, distinct blocks: Data Preparation, Pre-processing, and Statistical Analysis.

3.1 Data Preparation

The SummarizedExperiment package from Bioconductor offers well-defined computational data structures for representing various types of omics experiment data (Morgan et al. 2020). Utilizing these data structures can significantly improve data analysis. POMA leverages SummarizedExperiment objects, enhancing the reusability of existing methods for this class and contributing to more robust and reproducible workflows.

The workflow begins with either loading or creating a SummarizedExperiment object. Typically, your data might be stored in separate matrices and/or data frames. The PomaCreateObject function simplifies this step by quickly building a SummarizedExperiment object for you.

# create an SummarizedExperiment object from two separated data frames
target <- readr::read_csv("your_target.csv")
features <- readr::read_csv("your_features.csv")

data <- PomaCreateObject(metadata = target, features = features)

Alternatively, if your data is already in a SummarizedExperiment object, you can proceed directly to the pre-processing step. This vignette uses example data provided in POMA.

# load example data
data("st000336")
st000336
> class: SummarizedExperiment 
> dim: 31 57 
> metadata(0):
> assays(1): ''
> rownames(31): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): 1 2 ... 56 57
> colData names(2): group steroids

3.2 Pre Processing

3.2.1 Missing Value Imputation

imputed <- st000336 %>% 
  PomaImpute(method = "knn", zeros_as_na = TRUE, remove_na = TRUE, cutoff = 20)

imputed
> class: SummarizedExperiment 
> dim: 30 57 
> metadata(0):
> assays(1): ''
> rownames(30): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): 1 2 ... 56 57
> colData names(2): group steroids

3.2.2 Normalization

normalized <- imputed %>% 
  PomaNorm(method = "log_pareto")

normalized
> class: SummarizedExperiment 
> dim: 30 57 
> metadata(0):
> assays(1): ''
> rownames(30): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): 1 2 ... 56 57
> colData names(2): group steroids

3.2.2.1 Normalization effect

PomaBoxplots(imputed, x = "samples") # data before normalization

PomaBoxplots(normalized, x = "samples") # data after normalization

PomaDensity(imputed, x = "features") # data before normalization

PomaDensity(normalized, x = "features") # data after normalization

3.2.3 Outlier Detection

PomaOutliers(normalized)$polygon_plot

pre_processed <- PomaOutliers(normalized)$data
pre_processed
> class: SummarizedExperiment 
> dim: 30 52 
> metadata(0):
> assays(1): ''
> rownames(30): X1_METHYLHISTIDINE X3_METHYLHISTIDINE ... PYRUVATE
>   SUCCINATE
> rowData names(0):
> colnames(52): 1 2 ... 56 57
> colData names(2): group steroids
# pre_processed %>% 
#   PomaUnivariate(method = "ttest") %>% 
#   magrittr::extract2("result")
# imputed %>% 
#   PomaVolcano(pval = "adjusted", labels = TRUE)
# pre_processed %>% 
#   PomaUnivariate(method = "mann") %>% 
#   magrittr::extract2("result")
# PomaLimma(pre_processed, contrast = "Controls-DMD", adjust = "fdr")
# poma_pca <- PomaMultivariate(pre_processed, method = "pca")
# poma_pca$scoresplot +
#   ggplot2::ggtitle("Scores Plot")
# poma_plsda <- PomaMultivariate(pre_processed, method = "plsda")
# poma_plsda$scoresplot +
#   ggplot2::ggtitle("Scores Plot")
# poma_plsda$errors_plsda_plot +
#   ggplot2::ggtitle("Error Plot")
# poma_cor <- PomaCorr(pre_processed, label_size = 8, coeff = 0.6)
# poma_cor$correlations
# poma_cor$corrplot
# poma_cor$graph
# PomaCorr(pre_processed, corr_type = "glasso", coeff = 0.6)$graph
# alpha = 1 for Lasso
# PomaLasso(pre_processed, alpha = 1, labels = TRUE)$coefficientPlot
# poma_rf <- PomaRandForest(pre_processed, ntest = 10, nvar = 10)
# poma_rf$error_tree
# poma_rf$confusionMatrix$table
# poma_rf$MeanDecreaseGini_plot

4 Session Information

sessionInfo()
> R version 4.4.0 beta (2024-04-15 r86425)
> Platform: x86_64-pc-linux-gnu
> Running under: Ubuntu 22.04.4 LTS
> 
> Matrix products: default
> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_GB              LC_COLLATE=C              
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> time zone: America/New_York
> tzcode source: system (glibc)
> 
> attached base packages:
> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> 
> other attached packages:
>  [1] magrittr_2.0.3              SummarizedExperiment_1.33.3
>  [3] Biobase_2.63.1              GenomicRanges_1.55.4       
>  [5] GenomeInfoDb_1.39.14        IRanges_2.37.1             
>  [7] S4Vectors_0.41.6            BiocGenerics_0.49.1        
>  [9] MatrixGenerics_1.15.1       matrixStats_1.3.0          
> [11] patchwork_1.2.0             ggtext_0.1.2               
> [13] POMA_1.13.26                BiocStyle_2.31.0           
> 
> loaded via a namespace (and not attached):
>  [1] tidyselect_1.2.1        viridisLite_0.4.2       dplyr_1.1.4            
>  [4] farver_2.1.1            fastmap_1.1.1           janitor_2.2.0          
>  [7] digest_0.6.35           timechange_0.3.0        lifecycle_1.0.4        
> [10] cluster_2.1.6           compiler_4.4.0          rlang_1.1.3            
> [13] sass_0.4.9              tools_4.4.0             utf8_1.2.4             
> [16] yaml_2.3.8              knitr_1.46              S4Arrays_1.3.7         
> [19] labeling_0.4.3          DelayedArray_0.29.9     xml2_1.3.6             
> [22] abind_1.4-5             withr_3.0.0             purrr_1.0.2            
> [25] grid_4.4.0              fansi_1.0.6             colorspace_2.1-0       
> [28] ggplot2_3.5.0           scales_1.3.0            MASS_7.3-60.2          
> [31] tinytex_0.50            cli_3.6.2               rmarkdown_2.26         
> [34] vegan_2.6-4             crayon_1.5.2            generics_0.1.3         
> [37] httr_1.4.7              commonmark_1.9.1        cachem_1.0.8           
> [40] stringr_1.5.1           zlibbioc_1.49.3         splines_4.4.0          
> [43] parallel_4.4.0          impute_1.77.0           BiocManager_1.30.22    
> [46] XVector_0.43.1          vctrs_0.6.5             Matrix_1.7-0           
> [49] jsonlite_1.8.8          bookdown_0.39           magick_2.8.3           
> [52] tidyr_1.3.1             jquerylib_0.1.4         glue_1.7.0             
> [55] lubridate_1.9.3         stringi_1.8.3           gtable_0.3.4           
> [58] UCSC.utils_0.99.7       munsell_0.5.1           tibble_3.2.1           
> [61] pillar_1.9.0            htmltools_0.5.8.1       GenomeInfoDbData_1.2.12
> [64] R6_2.5.1                evaluate_0.23           lattice_0.22-6         
> [67] markdown_1.12           highr_0.10              gridtext_0.1.5         
> [70] snakecase_0.11.1        bslib_0.7.0             Rcpp_1.0.12            
> [73] SparseArray_1.3.5       nlme_3.1-164            permute_0.9-7          
> [76] mgcv_1.9-1              xfun_0.43               pkgconfig_2.0.3

References

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2020. SummarizedExperiment: SummarizedExperiment Container. https://bioconductor.org/packages/SummarizedExperiment.