Contents

1 Introduction

The TransOmicsData package contains datasets spanning various biological contexts such as in vitro embryonic and tissue-specific development in mouse and human. It covers multiple omics sequencing technologies such as RNAseq, mass spectrometry and ChIP-seq. This package was developed to provide convenient access to raw or pre-processed data for comparative trans-omics analysis.

2 The TransOmicsData package

2.1 Accessing the data

The data stored in this package can be retrieved using ExperimentHub.

# if (!requireNamespace("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")

# BiocManager::install("ExperimentHub")
library(ExperimentHub)
refreshHub(hubClass = "ExperimentHub")
## ExperimentHub with 8282 records
## # snapshotDate(): 2024-04-29
## # $dataprovider: Eli and Edythe L. Broad Institute of Harvard and MIT, NCBI,...
## # $species: Homo sapiens, Mus musculus, Saccharomyces cerevisiae, Drosophila...
## # $rdataclass: SummarizedExperiment, data.frame, ExpressionSet, matrix, char...
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH1"]]' 
## 
##            title                                                              
##   EH1    | RNA-Sequencing and clinical data for 7706 tumor samples from The...
##   EH166  | ERR188297                                                          
##   EH167  | ERR188088                                                          
##   EH168  | ERR188204                                                          
##   EH169  | ERR188317                                                          
##   ...      ...                                                                
##   EH9531 | Seurat Visium HD mouse brain data subset (8 um and 16 um)          
##   EH9532 | Seurat Xenium mouse brain data with multiple samples               
##   EH9533 | Seurat Xenium mouse brain data                                     
##   EH9534 | Seurat Vizgen test data with multiple samples                      
##   EH9535 | Seurat Vizgen test data
ehub <- ExperimentHub()
myfiles <- query(ehub, "TransOmicsData")
myfiles
## ExperimentHub with 12 records
## # snapshotDate(): 2024-04-29
## # $dataprovider: PRIDE, NCBI
## # $species: Mus musculus, Homo sapiens
## # $rdataclass: SummarizedExperiment
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH8536"]]' 
## 
##            title                                         
##   EH8536 | Chen organoid phosphoproteome                 
##   EH8537 | Chen organoid proteome                        
##   EH8538 | Chen organoid transcriptome                   
##   EH8539 | Xiao myogenesis differentation phosphoproteome
##   EH8540 | Xiao myogenesis differentiation proteome      
##   ...      ...                                           
##   EH8543 | Yang ESC epigenome                            
##   EH8544 | Yang ESC phosphoproteome                      
##   EH8545 | Yang ESC proteome                             
##   EH8546 | Yang ESC transcriptome                        
##   EH9515 | Chen organoid sctranscriptome

2.2 Package installation

# BiocManager::install("TransOmicsData")

To list the summarized metadata for all datasets in the package:

library(TransOmicsData)
listDatasets()
## DataFrame with 3 rows and 5 columns
##             Title            Description                  Omics     Species
##       <character>            <character>            <character> <character>
## 1   chen-organoid neural organoid diff.. phosphoproteome, pro..       human
## 2 xiao-myogenesis C2C12 myogenesis dif.. phosphoproteome, pro..       mouse
## 3        yang-esc ESC to epiLC differe.. epigenome, phosphopr..       human
##                RDataPath
##              <character>
## 1 TransOmicsData/0.99...
## 2 TransOmicsData/0.99...
## 3 TransOmicsData/0.99...

2.3 Citing TransOmicsData

We hope that TransOmicsData will be useful for your research. Please use the following information to cite the package. Thank you!

## Citation info
citation("TransOmicsData")
## To cite TransOmicsData in publications use:
## 
##   Chen C, Xiao D, Yang P (2024). _TransOmicsData: a collection of
##   trans-omics data covering a wide range of biological systems._.
##   University of Sydney, Sydney, Australia.
##   doi:10.18129/B9.bioc.TransOmicsData
##   <https://doi.org/10.18129/B9.bioc.TransOmicsData>,
##   <https://github.com/PYangLab/TransOmicsData>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {TransOmicsData: a collection of trans-omics data covering a wide range of biological systems.},
##     author = {Carissa Chen and Di Xiao and Pengyi Yang},
##     organization = {University of Sydney},
##     address = {Sydney, Australia},
##     year = {2024},
##     url = {https://github.com/PYangLab/TransOmicsData},
##     doi = {10.18129/B9.bioc.TransOmicsData},
##   }

Session info

## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] TransOmicsData_1.1.0 ExperimentHub_2.13.0 AnnotationHub_3.13.0 BiocFileCache_2.13.0 dbplyr_2.5.0        
## [6] BiocGenerics_0.51.0  BiocStyle_2.33.0    
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.45.0         xfun_0.43               bslib_0.7.0             Biobase_2.65.0         
##  [5] vctrs_0.6.5             tools_4.4.0             generics_0.1.3          stats4_4.4.0           
##  [9] curl_5.2.1              tibble_3.2.1            fansi_1.0.6             AnnotationDbi_1.67.0   
## [13] RSQLite_2.3.6           blob_1.2.4              pkgconfig_2.0.3         S4Vectors_0.43.0       
## [17] lifecycle_1.0.4         GenomeInfoDbData_1.2.12 compiler_4.4.0          Biostrings_2.73.0      
## [21] GenomeInfoDb_1.41.0     htmltools_0.5.8.1       sass_0.4.9              yaml_2.3.8             
## [25] pillar_1.9.0            crayon_1.5.2            jquerylib_0.1.4         cachem_1.0.8           
## [29] mime_0.12               tidyselect_1.2.1        digest_0.6.35           dplyr_1.1.4            
## [33] purrr_1.0.2             bookdown_0.39           BiocVersion_3.20.0      fastmap_1.1.1          
## [37] cli_3.6.2               magrittr_2.0.3          utf8_1.2.4              withr_3.0.0            
## [41] filelock_1.0.3          UCSC.utils_1.1.0        rappdirs_0.3.3          bit64_4.0.5            
## [45] rmarkdown_2.26          XVector_0.45.0          httr_1.4.7              bit_4.0.5              
## [49] png_0.1-8               memoise_2.0.1           evaluate_0.23           knitr_1.46             
## [53] IRanges_2.39.0          rlang_1.1.3             glue_1.7.0              DBI_1.2.2              
## [57] BiocManager_1.30.22     jsonlite_1.8.8          R6_2.5.1                zlibbioc_1.51.0