Reproducibility in Microbiome Data Analysis

Reproducibility is a crucial aspect of data analysis, particularly in the context of microbiome data. The ability to consistently replicate an analysis and obtain the same results is essential for ensuring the reliability of findings and facilitating scientific collaboration.

The dar package includes two key functions, export_steps and import_steps, which promote reproducibility in microbiome data analysis. These functions allow you to export the steps of a recipe to a JSON file and then import those steps to reproduce the analysis in a different environment.

Exporting Steps of a Recipe

The export_steps function facilitates the export of a recipe’s steps to a JSON file. This is useful for documenting and sharing the parameters used in the analysis.

Here’s an example of how to use the export_steps function:

library(dar)
data(metaHIV_phy)

# Create a recipe with steps
rec <- 
  recipe(metaHIV_phy, "RiskGroup2", "Species") |>
  step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
  step_filter_taxa(.f = "function(x) sum(x > 0) >= (0.3 * length(x))") |>
  step_maaslin()

# Export the steps to a JSON file
out_file <- tempfile(fileext = ".json")
export_steps(rec, out_file)

In this example, a recipe with multiple steps is created, and then the steps are exported to a JSON file using the export_steps function.

Importing Steps from a JSON File

The import_steps function allows you to import steps from a JSON file and add them to an existing recipe. This is useful when you want to reuse a previously saved set of steps or incorporate steps from another recipe into your current analysis.

Here’s an example of how to use the import_steps function:

# Initialize a recipe with a phyloseq object
rec <- recipe(metaHIV_phy, "RiskGroup2", "Species")

# Import the steps from a JSON file
json_file <- out_file
rec <- import_steps(rec, json_file)
rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#> 
#>      ℹ phyloseq object with 451 taxa and 156 samples 
#>      ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid) 
#>      ℹ taxonomic level Species 
#> 
#> Preporcessing steps:
#> 
#>      ◉ step_subset_taxa() id = subset_taxa__Flaugnarde 
#>      ◉ step_filter_taxa() id = filter_taxa__Kifli 
#> 
#> DA steps:
#> 
#>      ◉ step_maaslin() id = maaslin__Hellimli

In this example, an empty recipe is initialized, and then the steps are imported from a JSON file using the import_steps function. The imported steps are added to the existing recipe.

Once the recipe is imported, we can choose to add more steps or execute the code using the prep function. In this case, we choose to execute prep directly.

## Execute
da_results <- prep(rec, parallel = FALSE) |> bake()
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#> 
#>      ℹ phyloseq object with 101 taxa and 156 samples 
#>      ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid) 
#>      ℹ taxonomic level Species 
#> 
#> Results:
#> 
#>      ✔ maaslin__Hellimli diff_taxa = 86 
#> 
#>      ℹ 86 taxa are present in all tested methods 
#> 
#> Bakes:
#> 
#>      ◉ 1 -> count_cutoff: NULL, weights: NULL, exclude: NULL, id: bake__Coussin_de_Lyon

Limitations and Considerations

It’s important to note some limitations and considerations when using the export_steps and import_steps functions:

Conclusion

Reproducibility is essential in microbiome data analysis, and the dar package facilitates this aspect by providing the export_steps and import_steps functions. These functions allow you to export the steps of a recipe to a JSON file and then import them to reproduce the analysis in a different environment. With these tools, you can effectively document and share your analyses, increasing transparency and the reliability of your results.

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 RC (2024-04-16 r86468)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-05-01
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                  * version  date (UTC) lib source
#>  abind                      1.4-5    2016-07-21 [2] CRAN (R 4.4.0)
#>  ade4                       1.7-22   2023-02-06 [2] CRAN (R 4.4.0)
#>  ape                        5.8      2024-04-11 [2] CRAN (R 4.4.0)
#>  archive                    1.1.8    2024-04-28 [2] CRAN (R 4.4.0)
#>  assertthat                 0.2.1    2019-03-21 [2] CRAN (R 4.4.0)
#>  beachmat                   2.21.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  beeswarm                   0.4.0    2021-06-01 [2] CRAN (R 4.4.0)
#>  biglm                      0.9-2.1  2020-11-27 [2] CRAN (R 4.4.0)
#>  Biobase                  * 2.65.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  BiocGenerics             * 0.51.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  BiocNeighbors              1.23.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  BiocParallel               1.39.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  BiocSingular               1.21.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  biomformat                 1.33.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  Biostrings               * 2.73.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  bit                        4.0.5    2022-11-15 [2] CRAN (R 4.4.0)
#>  bit64                      4.0.5    2020-08-30 [2] CRAN (R 4.4.0)
#>  bluster                    1.15.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  brio                       1.1.5    2024-04-24 [2] CRAN (R 4.4.0)
#>  bslib                      0.7.0    2024-03-29 [2] CRAN (R 4.4.0)
#>  ca                         0.71.1   2020-01-24 [2] CRAN (R 4.4.0)
#>  cachem                     1.0.8    2023-05-01 [2] CRAN (R 4.4.0)
#>  cli                        3.6.2    2023-12-11 [2] CRAN (R 4.4.0)
#>  cluster                    2.1.6    2023-12-01 [3] CRAN (R 4.4.0)
#>  codetools                  0.2-20   2024-03-31 [3] CRAN (R 4.4.0)
#>  colorspace                 2.1-0    2023-01-23 [2] CRAN (R 4.4.0)
#>  crayon                     1.5.2    2022-09-29 [2] CRAN (R 4.4.0)
#>  crosstalk                  1.2.1    2023-11-23 [2] CRAN (R 4.4.0)
#>  dar                      * 1.1.0    2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#>  data.table                 1.15.4   2024-03-30 [2] CRAN (R 4.4.0)
#>  DBI                        1.2.2    2024-02-16 [2] CRAN (R 4.4.0)
#>  DECIPHER                   3.1.0    2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  decontam                   1.25.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  DelayedArray               0.31.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  DelayedMatrixStats         1.27.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  dendextend                 1.17.1   2023-03-25 [2] CRAN (R 4.4.0)
#>  DEoptimR                   1.1-3    2023-10-07 [2] CRAN (R 4.4.0)
#>  devtools                   2.4.5    2022-10-11 [2] CRAN (R 4.4.0)
#>  digest                     0.6.35   2024-03-11 [2] CRAN (R 4.4.0)
#>  DirichletMultinomial       1.47.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  dplyr                      1.1.4    2023-11-17 [2] CRAN (R 4.4.0)
#>  ellipsis                   0.3.2    2021-04-29 [2] CRAN (R 4.4.0)
#>  evaluate                   0.23     2023-11-01 [2] CRAN (R 4.4.0)
#>  fansi                      1.0.6    2023-12-08 [2] CRAN (R 4.4.0)
#>  farver                     2.1.1    2022-07-06 [2] CRAN (R 4.4.0)
#>  fastmap                    1.1.1    2023-02-24 [2] CRAN (R 4.4.0)
#>  foreach                    1.5.2    2022-02-02 [2] CRAN (R 4.4.0)
#>  fs                         1.6.4    2024-04-25 [2] CRAN (R 4.4.0)
#>  furrr                      0.3.1    2022-08-15 [2] CRAN (R 4.4.0)
#>  future                     1.33.2   2024-03-26 [2] CRAN (R 4.4.0)
#>  generics                   0.1.3    2022-07-05 [2] CRAN (R 4.4.0)
#>  GenomeInfoDb             * 1.41.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  GenomeInfoDbData           1.2.12   2024-04-23 [2] Bioconductor
#>  GenomicRanges            * 1.57.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  getopt                     1.20.4   2023-10-01 [2] CRAN (R 4.4.0)
#>  ggbeeswarm                 0.7.2    2023-04-29 [2] CRAN (R 4.4.0)
#>  ggplot2                    3.5.1    2024-04-23 [2] CRAN (R 4.4.0)
#>  ggrepel                    0.9.5    2024-01-10 [2] CRAN (R 4.4.0)
#>  globals                    0.16.3   2024-03-08 [2] CRAN (R 4.4.0)
#>  glue                       1.7.0    2024-01-09 [2] CRAN (R 4.4.0)
#>  gridExtra                  2.3      2017-09-09 [2] CRAN (R 4.4.0)
#>  gtable                     0.3.5    2024-04-22 [2] CRAN (R 4.4.0)
#>  hash                       2.2.6.3  2023-08-19 [2] CRAN (R 4.4.0)
#>  heatmaply                  1.5.0    2023-10-06 [2] CRAN (R 4.4.0)
#>  highr                      0.10     2022-12-22 [2] CRAN (R 4.4.0)
#>  hms                        1.1.3    2023-03-21 [2] CRAN (R 4.4.0)
#>  htmltools                  0.5.8.1  2024-04-04 [2] CRAN (R 4.4.0)
#>  htmlwidgets                1.6.4    2023-12-06 [2] CRAN (R 4.4.0)
#>  httpuv                     1.6.15   2024-03-26 [2] CRAN (R 4.4.0)
#>  httr                       1.4.7    2023-08-15 [2] CRAN (R 4.4.0)
#>  igraph                     2.0.3    2024-03-13 [2] CRAN (R 4.4.0)
#>  IRanges                  * 2.39.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  irlba                      2.3.5.1  2022-10-03 [2] CRAN (R 4.4.0)
#>  iterators                  1.0.14   2022-02-05 [2] CRAN (R 4.4.0)
#>  jquerylib                  0.1.4    2021-04-26 [2] CRAN (R 4.4.0)
#>  jsonlite                   1.8.8    2023-12-04 [2] CRAN (R 4.4.0)
#>  knitr                      1.46     2024-04-06 [2] CRAN (R 4.4.0)
#>  labeling                   0.4.3    2023-08-29 [2] CRAN (R 4.4.0)
#>  later                      1.3.2    2023-12-06 [2] CRAN (R 4.4.0)
#>  lattice                    0.22-6   2024-03-20 [3] CRAN (R 4.4.0)
#>  lazyeval                   0.2.2    2019-03-15 [2] CRAN (R 4.4.0)
#>  lifecycle                  1.0.4    2023-11-07 [2] CRAN (R 4.4.0)
#>  listenv                    0.9.1    2024-01-29 [2] CRAN (R 4.4.0)
#>  logging                    0.10-108 2019-07-14 [2] CRAN (R 4.4.0)
#>  Maaslin2                   1.19.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  magrittr                   2.0.3    2022-03-30 [2] CRAN (R 4.4.0)
#>  MASS                       7.3-60.2 2024-04-23 [3] local
#>  Matrix                     1.7-0    2024-03-22 [3] CRAN (R 4.4.0)
#>  MatrixGenerics           * 1.17.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  matrixStats              * 1.3.0    2024-04-11 [2] CRAN (R 4.4.0)
#>  memoise                    2.0.1    2021-11-26 [2] CRAN (R 4.4.0)
#>  mgcv                       1.9-1    2023-12-21 [3] CRAN (R 4.4.0)
#>  mia                      * 1.13.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  microbiome                 1.27.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  mime                       0.12     2021-09-28 [2] CRAN (R 4.4.0)
#>  miniUI                     0.1.1.1  2018-05-18 [2] CRAN (R 4.4.0)
#>  MultiAssayExperiment     * 1.31.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  multtest                   2.61.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  munsell                    0.5.1    2024-04-01 [2] CRAN (R 4.4.0)
#>  mvtnorm                    1.2-4    2023-11-27 [2] CRAN (R 4.4.0)
#>  nlme                       3.1-164  2023-11-27 [3] CRAN (R 4.4.0)
#>  optparse                   1.7.5    2024-04-16 [2] CRAN (R 4.4.0)
#>  parallelly                 1.37.1   2024-02-29 [2] CRAN (R 4.4.0)
#>  pbapply                    1.7-2    2023-06-27 [2] CRAN (R 4.4.0)
#>  pcaPP                      2.0-4    2023-12-07 [2] CRAN (R 4.4.0)
#>  permute                    0.9-7    2022-01-27 [2] CRAN (R 4.4.0)
#>  phyloseq                 * 1.49.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  pillar                     1.9.0    2023-03-22 [2] CRAN (R 4.4.0)
#>  pkgbuild                   1.4.4    2024-03-17 [2] CRAN (R 4.4.0)
#>  pkgconfig                  2.0.3    2019-09-22 [2] CRAN (R 4.4.0)
#>  pkgload                    1.3.4    2024-01-16 [2] CRAN (R 4.4.0)
#>  plotly                     4.10.4   2024-01-13 [2] CRAN (R 4.4.0)
#>  plyr                       1.8.9    2023-10-02 [2] CRAN (R 4.4.0)
#>  profvis                    0.3.8    2023-05-02 [2] CRAN (R 4.4.0)
#>  promises                   1.3.0    2024-04-05 [2] CRAN (R 4.4.0)
#>  purrr                      1.0.2    2023-08-10 [2] CRAN (R 4.4.0)
#>  R6                         2.5.1    2021-08-19 [2] CRAN (R 4.4.0)
#>  RColorBrewer               1.1-3    2022-04-03 [2] CRAN (R 4.4.0)
#>  Rcpp                       1.0.12   2024-01-09 [2] CRAN (R 4.4.0)
#>  readr                      2.1.5    2024-01-10 [2] CRAN (R 4.4.0)
#>  registry                   0.5-1    2019-03-05 [2] CRAN (R 4.4.0)
#>  remotes                    2.5.0    2024-03-17 [2] CRAN (R 4.4.0)
#>  reshape2                   1.4.4    2020-04-09 [2] CRAN (R 4.4.0)
#>  rhdf5                      2.49.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  rhdf5filters               1.17.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  Rhdf5lib                   1.27.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  rlang                      1.1.3    2024-01-10 [2] CRAN (R 4.4.0)
#>  rmarkdown                  2.26     2024-03-05 [2] CRAN (R 4.4.0)
#>  robustbase                 0.99-2   2024-01-27 [2] CRAN (R 4.4.0)
#>  rsvd                       1.0.5    2021-04-16 [2] CRAN (R 4.4.0)
#>  Rtsne                      0.17     2023-12-07 [2] CRAN (R 4.4.0)
#>  S4Arrays                   1.5.0    2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  S4Vectors                * 0.43.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  sass                       0.4.9    2024-03-15 [2] CRAN (R 4.4.0)
#>  ScaledMatrix               1.13.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  scales                     1.3.0    2023-11-28 [2] CRAN (R 4.4.0)
#>  scater                     1.33.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  scuttle                    1.15.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  seriation                  1.5.5    2024-04-17 [2] CRAN (R 4.4.0)
#>  sessioninfo                1.2.2    2021-12-06 [2] CRAN (R 4.4.0)
#>  shiny                      1.8.1.1  2024-04-02 [2] CRAN (R 4.4.0)
#>  SingleCellExperiment     * 1.27.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  SparseArray                1.5.0    2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  sparseMatrixStats          1.17.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  stringi                    1.8.3    2023-12-11 [2] CRAN (R 4.4.0)
#>  stringr                    1.5.1    2023-11-14 [2] CRAN (R 4.4.0)
#>  SummarizedExperiment     * 1.35.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  survival                   3.6-4    2024-04-24 [3] CRAN (R 4.4.0)
#>  testthat                   3.2.1.1  2024-04-14 [2] CRAN (R 4.4.0)
#>  tibble                     3.2.1    2023-03-20 [2] CRAN (R 4.4.0)
#>  tidyr                      1.3.1    2024-01-24 [2] CRAN (R 4.4.0)
#>  tidyselect                 1.2.1    2024-03-11 [2] CRAN (R 4.4.0)
#>  tidytree                   0.4.6    2023-12-12 [2] CRAN (R 4.4.0)
#>  treeio                     1.29.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  TreeSummarizedExperiment * 2.13.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  TSP                        1.2-4    2023-04-04 [2] CRAN (R 4.4.0)
#>  tzdb                       0.4.0    2023-05-12 [2] CRAN (R 4.4.0)
#>  UCSC.utils                 1.1.0    2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  UpSetR                     1.4.0    2019-05-22 [2] CRAN (R 4.4.0)
#>  urlchecker                 1.0.1    2021-11-30 [2] CRAN (R 4.4.0)
#>  usethis                    2.2.3    2024-02-19 [2] CRAN (R 4.4.0)
#>  utf8                       1.2.4    2023-10-22 [2] CRAN (R 4.4.0)
#>  vctrs                      0.6.5    2023-12-01 [2] CRAN (R 4.4.0)
#>  vegan                      2.6-4    2022-10-11 [2] CRAN (R 4.4.0)
#>  vipor                      0.4.7    2023-12-18 [2] CRAN (R 4.4.0)
#>  viridis                    0.6.5    2024-01-29 [2] CRAN (R 4.4.0)
#>  viridisLite                0.4.2    2023-05-02 [2] CRAN (R 4.4.0)
#>  vroom                      1.6.5    2023-12-05 [2] CRAN (R 4.4.0)
#>  webshot                    0.5.5    2023-06-26 [2] CRAN (R 4.4.0)
#>  withr                      3.0.0    2024-01-16 [2] CRAN (R 4.4.0)
#>  xfun                       0.43     2024-03-25 [2] CRAN (R 4.4.0)
#>  xtable                     1.8-4    2019-04-21 [2] CRAN (R 4.4.0)
#>  XVector                  * 0.45.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  yaml                       2.3.8    2023-12-11 [2] CRAN (R 4.4.0)
#>  yulab.utils                0.1.4    2024-01-28 [2] CRAN (R 4.4.0)
#>  zlibbioc                   1.51.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> 
#>  [1] /tmp/RtmppiEMBP/Rinst3ffcc45cffd9ea
#>  [2] /home/biocbuild/bbs-3.20-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.20-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────