Contents

1 Introduction

Statistical modeling is a powerful tool, but it can also be a bit overwhelming to use efficiently. Different modeling engines exist (lm, limma, lme, lmer, wilcoxon), with each different interfaces. Autonomics offers a unified modeling interface to these different modeling engines. This makes them easy to use and also to compare.

We illustrate the functionality using data from the hypoglycemia metabolomics profiling study of Halama et al. and Atkin (2018), in which participants (t0) were subjected to mild (t1) and stronger (t2) hypoglycemia, and then brought back to normal levels, additionally being profiled again the next day (t3).

   require(magrittr)
   require(autonomics)
    file <- system.file('extdata/atkin.metabolon.xlsx', package = 'autonomics')
   object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
   object$subgroup %<>% factor()
   object$Subject %<>% factor()

2 Classicial: lm and limma

lm

Autonomics provides a one-liner interface to the classical statistics modeling engine lm.
The design is defined using the formula interface:

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object$subgroup %<>% factor()
object$Subject %<>% factor()
object %<>% fit_lm(~ subgroup)
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Filter
##                       Keep 19/20 features with 3+ values per subgroup
##               lm(~subgroup)
##                          coefficient    fit downfdr upfdr downp   upp
##                               <fctr> <fctr>   <int> <int> <int> <int>
##                       1:   Intercept     lm       0    19     0    19
##                       2:          t1     lm       5     1     6     1
##                       3:          t2     lm       6     2     6     2
##                       4:          t3     lm       1     1     1     2

limma

limma innovated statistcis modeling for Bioinformatics over lm in several ways, out of which the three most fundamental are:

  1. fast for even a large number of features.
  2. A generalized contrast analysis framework that allows to compute statistics for any contrast (i.e. linear combination of model coefficients)
  3. A moderated t-test that places upweights effect size (and downweights standard deviation) in significance computation, increasing biological meaningfulness of the results.

Autonomics provides a similar one-liner interface to the limma modeling engine. The results of lm and limma are quite similar for coefficient (t3, i.e. the residual effect of the hypoglycemia study after 24h), which is, in fact re-assuring.

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object %<>% fit_lm(   ~ subgroup)
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Filter
##                       Keep 19/20 features with 3+ values per subgroup
##               lm(~subgroup)
##                          coefficient    fit downfdr upfdr downp   upp
##                               <fctr> <fctr>   <int> <int> <int> <int>
##                       1:   Intercept     lm       0    19     0    19
##                       2:          t1     lm       5     1     6     1
##                       3:          t2     lm       6     2     6     2
##                       4:          t3     lm       1     1     1     2
object %<>% fit_limma(~ subgroup) # prefers large effects over small sds
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               lmFit(~subgroup)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma       0    20     0    20
##                   2:          t1  limma       5     1     6     1
##                   3:          t2  limma       6     2     6     2
##                   4:          t3  limma       1     1     1     2
plot_contrast_venn(is_sig(object, fit = fits(fdt(object)), contrast = 't3'))
# plot_contrast_boxplots(object, subgroupvar='SET', contrast='t3', fit = c('lm', 'limma'))

3 Blocked: limma, lme, lmer

Block effects (i.e. subject) need to be accounted for.
lme, lmer and limma all support the modeling of block effects as random effects. Autonomics provides easy access to model block effects with the block variable:

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object %<>% fit_limma(~ subgroup, block = 'Subject')
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Dupcor `Subject`
##               lmFit(~subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma       0    20     0    20
##                   2:          t1  limma       8     1    10     1
##                   3:          t2  limma       7     3     8     3
##                   4:          t3  limma       1     1     4     2
object %<>% fit_lme(  ~ subgroup, block = 'Subject')
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Filter
##                       Keep 19/20 features with 3+ values per subgroup
##                       Keep 16/17 fully connected blocks with 64/67 samples
##               lme(~subgroup, random = list(Subject = ~1))
##                          coefficient    fit downfdr upfdr downp   upp
##                               <fctr> <fctr>   <int> <int> <int> <int>
##                       1:   Intercept    lme       0    19     0    19
##                       2:          t1    lme       8     1     9     1
##                       3:          t2    lme       8     3     8     3
##                       4:          t3    lme       2     2     4     2
object %<>% fit_lmer( ~ subgroup, block = 'Subject') # results identical for this basic 
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Filter
##                       Keep 19/20 features with 3+ values per subgroup
##                       Keep 16/17 fully connected blocks with 64/67 samples
##               lmer(~subgroup + (1 | Subject))
##                          coefficient    fit downfdr upfdr downp   upp
##                               <fctr> <fctr>   <int> <int> <int> <int>
##                       1:   Intercept   lmer       0    19     0    19
##                       2:          t1   lmer       8     1     9     1
##                       3:          t2   lmer       8     3     8     3
##                       4:          t3   lmer       2     2     4     2
plot_contrast_venn(is_sig(object, fit = fits(fdt(object)), contrast = 't3'))
# plot_contrast_boxplots(object, subgroupvar='SET', contrast='t3', fit = c('lm', 'limma'))

4 Flexible: limma’s contrasts.fit

autonomics also provides easy access to limma’s extremely flexible contrasts.fit interface: for any linear combination between model coefficients statistics can be computed.

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object %<>% fit_limma(~0+subgroup, contrasts = c('t1-t0', 't2-t0', 't3-t0'), block = 'Subject')
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Dupcor `Subject`
##               lmFit(~0 + subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:       t1-t0  limma       8     1    10     1
##                   2:       t2-t0  limma       7     3     8     3
##                   3:       t3-t0  limma       1     1     4     2
object %<>% fit_limma(~0+subgroup, contrasts = c('t1-t0', 't2-t1', 't3-t2'), block = 'Subject')
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Dupcor `Subject`
##               lmFit(~0 + subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:       t1-t0  limma       8     1    10     1
##                   2:       t2-t1  limma       1     1     1     2
##                   3:       t3-t2  limma       2     6     3     6

5 Alternative coding schemes: limma, lm, lme, lmer

An alternative approach to compute statistics of interest is to use alternative coding systems for contrast variables. This system is less intuitive than limma’s flexible contrast interface, but it is shared by all modeling engines (limma’s approach is limited to limma only). In this system one specifies which coding system to use for a factor variable. The default is to use treatment coding (where the different model coefficients express baseline differences), but an alternative would be to use backward difference coding.

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object %<>% fit_limma(~subgroup, block = 'Subject')  # baseline diffs
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Dupcor `Subject`
##               lmFit(~subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma       0    20     0    20
##                   2:          t1  limma       8     1    10     1
##                   3:          t2  limma       7     3     8     3
##                   4:          t3  limma       1     1     4     2
object %<>% fit_limma(~subgroup, block = 'Subject', codingfun = code_diff)  # backward diffs
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0  t1  t2  t3 
##                       Intercept 1/4 1/4 1/4 1/4
##                       t1-t0      -1   1   .   .
##                       t2-t1       .  -1   1   .
##                       t3-t2       .   .  -1   1
##      Rm from fdt: ^(effect|p|fdr|t)~(Intercept|t1-t0|t2-t1|t3-t2)~(limma)$
##               Dupcor `Subject`
##               lmFit(~subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma       0    20     0    20
##                   2:       t1-t0  limma       8     1    10     1
##                   3:       t2-t1  limma       1     1     1     2
##                   4:       t3-t2  limma       2     6     3     6

6 A non-parameteric alternative: wilcoxon

object <- read_metabolon(file)
##  Read: /tmp/Rtmp8Of9GD/Rinst225dd845c24cbf/autonomics/extdata/atkin.metabolon.xlsx
##               Infer subgroup from sample_ids
##               log2 metabolon
object %<>% fit_limma(   ~0+subgroup, contrasts = c('t3-t0'), block = 'Subject')
##     LinMod
##               Code subgroup
##                                level
##                     coefficient t0 t1 t2 t3
##                       Intercept  1  .  .  .
##                       t1        -1  1  .  .
##                       t2        -1  .  1  .
##                       t3        -1  .  .  1
##               Dupcor `Subject`
##               lmFit(~0 + subgroup | Subject)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:       t3-t0  limma       1     1     4     2
object %<>% fit_wilcoxon(~0+subgroup, contrasts = c('t3-t0'), block = 'Subject')
##      Wilcoxon
##          wilcox.test(x = t0, y = t3, paired = TRUE) - pair on 'Subject'
##             coefficient      fit downfdr upfdr downp   upp
##                  <fctr>   <fctr>   <int> <int> <int> <int>
##          1:       t3-t0 wilcoxon       1     0     6     2
plot_contrast_venn(is_sig(object, contrast = 't3-t0', fit = fits(fdt(object))))

sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.3    ggplot2_3.5.1     data.table_1.15.4 autonomics_1.12.0
## [5] BiocStyle_2.32.0 
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          jsonlite_1.8.8             
##   [3] MultiAssayExperiment_1.30.0 magick_2.8.3               
##   [5] nloptr_2.0.3                farver_2.1.1               
##   [7] rmarkdown_2.26              zlibbioc_1.50.0            
##   [9] vctrs_0.6.5                 memoise_2.0.1              
##  [11] minqa_1.2.6                 tinytex_0.50               
##  [13] htmltools_0.5.8.1           S4Arrays_1.4.0             
##  [15] curl_5.2.1                  cellranger_1.1.0           
##  [17] SparseArray_1.4.0           sass_0.4.9                 
##  [19] bslib_0.7.0                 plyr_1.8.9                 
##  [21] cachem_1.0.8                igraph_2.0.3               
##  [23] lifecycle_1.0.4             codingMatrices_0.4.0       
##  [25] pkgconfig_2.0.3             Matrix_1.7-0               
##  [27] R6_2.5.1                    fastmap_1.1.1              
##  [29] GenomeInfoDbData_1.2.12     MatrixGenerics_1.16.0      
##  [31] numDeriv_2016.8-1.1         digest_0.6.35              
##  [33] pcaMethods_1.96.0           colorspace_2.1-0           
##  [35] rARPACK_0.11-0              S4Vectors_0.42.0           
##  [37] RSpectra_0.16-1             ellipse_0.5.0              
##  [39] GenomicRanges_1.56.0        RSQLite_2.3.6              
##  [41] filelock_1.0.3              labeling_0.4.3             
##  [43] fansi_1.0.6                 httr_1.4.7                 
##  [45] polyclip_1.10-6             abind_1.4-5                
##  [47] compiler_4.4.0              bit64_4.0.5                
##  [49] withr_3.0.0                 BiocParallel_1.38.0        
##  [51] DBI_1.2.2                   highr_0.10                 
##  [53] ggforce_0.4.2               R.utils_2.12.3             
##  [55] MASS_7.3-60.2               DelayedArray_0.30.0        
##  [57] corpcor_1.6.10              tools_4.4.0                
##  [59] R.oo_1.26.0                 glue_1.7.0                 
##  [61] nlme_3.1-164                grid_4.4.0                 
##  [63] reshape2_1.4.4              generics_0.1.3             
##  [65] gtable_0.3.5                tzdb_0.4.0                 
##  [67] R.methodsS3_1.8.2           tidyr_1.3.1                
##  [69] hms_1.1.3                   xml2_1.3.6                 
##  [71] utf8_1.2.4                  XVector_0.44.0             
##  [73] BiocGenerics_0.50.0         ggrepel_0.9.5              
##  [75] pillar_1.9.0                stringr_1.5.1              
##  [77] limma_3.60.0                splines_4.4.0              
##  [79] dplyr_1.1.4                 tweenr_2.0.3               
##  [81] BiocFileCache_2.12.0        lattice_0.22-6             
##  [83] bit_4.0.5                   GEOquery_2.72.0            
##  [85] tidyselect_1.2.1            mixOmics_6.28.0            
##  [87] locfit_1.5-9.9              knitr_1.46                 
##  [89] gridExtra_2.3               bookdown_0.39              
##  [91] IRanges_2.38.0              edgeR_4.2.0                
##  [93] SummarizedExperiment_1.34.0 stats4_4.4.0               
##  [95] xfun_0.43                   Biobase_2.64.0             
##  [97] statmod_1.5.0               matrixStats_1.3.0          
##  [99] stringi_1.8.3               UCSC.utils_1.0.0           
## [101] rematch_2.0.0               yaml_2.3.8                 
## [103] boot_1.3-30                 evaluate_0.23              
## [105] codetools_0.2-20            tibble_3.2.1               
## [107] BiocManager_1.30.22         cli_3.6.2                  
## [109] munsell_0.5.1               jquerylib_0.1.4            
## [111] Rcpp_1.0.12                 fractional_0.1.3           
## [113] GenomeInfoDb_1.40.0         readxl_1.4.3               
## [115] dbplyr_2.5.0                parallel_4.4.0             
## [117] readr_2.1.5                 blob_1.2.4                 
## [119] lme4_1.1-35.3               lmerTest_3.1-3             
## [121] scales_1.3.0                purrr_1.0.2                
## [123] crayon_1.5.2                rlang_1.1.3

References

Appendix

A M Billing, S S Dib, A M Bhagwat, I T da Silva, R D Drummond, S Hayat, R Al-Mismar, H Ben-Hamidane, N Goswami, K Engholm-Keller, M R Larsen, K Suhre, A Rafii, J Graummann (2019). Mol Cell Proteomics. 18(10):1950-1966. doi: 10.1074/mcp.RA119.001356.

A M Billing, H Ben Hamidane, S S Dib, R J Cotton, A M Bhagwat, P Kumar, S Hayat, N A Yousri, N Goswami, K Suhre, A Rafii, J Graumann (2016). Comprehensive transcriptomics and proteomics characterization of human mesenchymal stem cells reveals source specific cellular markers. Sci Rep. 9;6:21507. doi: 10.1038/srep21507.

R Fukuda, R Marin-Juez, H El-Sammak, A Beisaw, R Ramadass, C Kuenne, S Guenther, A Konzer, A M Bhagwat, J Graumann, D YR Stainier (2020). EMBO Rep. 21(8): e49752. doi: 10.15252/embr.201949752

A Halama, M Kulinski, S Dib, S B Zaghlool, K S Siveen, A Iskandarani, J Zierer 3, K S Prabhu, N J Satheesh, A M Bhagwat, S Uddin, G Kastenmüller, O Elemento, S S Gross, K Suhre (2018). Accelerated lipid catabolism and autophagy are cancer survival mechanisms under inhibited glutaminolysis. Cancer Lett., 430:133-147. doi:10.1016/j.canlet.2018.05.017

A Halama, H Kahal, A M Bhagwat, J Zierer, T Sathyapalan, J Graumann, K Suhre, S L Atkin (2018). Metabolic and proteomics signatures of hypoglycaemia in type 2 diabetes. Diabetes, obesity and metabolism, https://doi.org/10.1111/dom.13602