Contents

1 Motivation

The DropletTestFiles package contains files for testing droplet-based utilities, such as those in the DropletUtils package. These files are literally the raw output of pipelines like 10X Genomics’ CellRanger software suite, and are usually not in an (immediately) analysis-ready state. After all, the idea is to provide some material to test the utilities to get to such a state!

2 Functions

This package doesn’t do anything except pull down and serve up files, so there’s not much to talk about here. There are two convenience functions to help obtain content from ExperimentHub. The first is to list all available resources managed by DropletTestFiles:

library(DropletTestFiles)
out <- listTestFiles()
out
## DataFrame with 52 rows and 18 columns
##                         title       dataprovider      species taxonomyid
##                   <character>        <character>  <character>  <integer>
## EH3685 10X brain nuclei 1k ..       10X Genomics Mus musculus      10090
## EH3686 10X brain nuclei 1k ..       10X Genomics Mus musculus      10090
## EH3687 10X brain nuclei 1k ..       10X Genomics Mus musculus      10090
## EH3688 10X brain nuclei 1k ..       10X Genomics Mus musculus      10090
## EH3689 10X brain nuclei 1k ..       10X Genomics Mus musculus      10090
## ...                       ...                ...          ...        ...
## EH3732 HiSeq 4000-sequenced.. Jonathan Griffiths Mus musculus      10090
## EH3769 10X PBMC 4k raw coun..       10X Genomics Homo sapiens       9606
## EH3770 10X PBMC 4k filtered..       10X Genomics Homo sapiens       9606
## EH3771 10X PBMC 4k raw HDF5..       10X Genomics Homo sapiens       9606
## EH3772 10X PBMC 4k molecule..       10X Genomics Homo sapiens       9606
##             genome            description coordinate_1_based
##        <character>            <character>          <integer>
## EH3685        mm10 Molecule information..                  1
## EH3686        mm10 Filtered HDF5 matrix..                  1
## EH3687        mm10 Raw HDF5 matrix for ..                  1
## EH3688        mm10 Filtered count matri..                  1
## EH3689        mm10 Raw count matrix for..                  1
## ...            ...                    ...                ...
## EH3732        mm10 Molecule information..                  1
## EH3769        hg38 Raw count matrix for..                  1
## EH3770        hg38 Filtered count matri..                  1
## EH3771        hg38 Raw HDF5 matrix for ..                  1
## EH3772        hg38 Molecule information..                  1
##                    maintainer rdatadateadded    preparerclass
##                   <character>    <character>      <character>
## EH3685 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## EH3686 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## EH3687 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## EH3688 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## EH3689 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## ...                       ...            ...              ...
## EH3732 Aaron Lun <infinite...     2020-08-26 DropletTestFiles
## EH3769 Aaron Lun <infinite...     2020-09-08 DropletTestFiles
## EH3770 Aaron Lun <infinite...     2020-09-08 DropletTestFiles
## EH3771 Aaron Lun <infinite...     2020-09-08 DropletTestFiles
## EH3772 Aaron Lun <infinite...     2020-09-08 DropletTestFiles
##                                                   tags  rdataclass
##                                                 <AsIs> <character>
## EH3685 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3686 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3687 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3688 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3689 ExperimentHub,ExperimentData,ExpressionData,...   character
## ...                                                ...         ...
## EH3732 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3769 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3770 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3771 ExperimentHub,ExperimentData,ExpressionData,...   character
## EH3772 ExperimentHub,ExperimentData,ExpressionData,...   character
##                     rdatapath              sourceurl  sourcetype
##                   <character>            <character> <character>
## EH3685 DropletTestFiles/ten.. https://support.10xg..        HDF5
## EH3686 DropletTestFiles/ten.. https://support.10xg..        HDF5
## EH3687 DropletTestFiles/ten.. https://support.10xg..        HDF5
## EH3688 DropletTestFiles/ten.. https://support.10xg..      tar.gz
## EH3689 DropletTestFiles/ten.. https://support.10xg..      tar.gz
## ...                       ...                    ...         ...
## EH3732 DropletTestFiles/bac.. https://jmlab-gitlab..        HDF5
## EH3769 DropletTestFiles/ten.. https://support.10xg..      tar.gz
## EH3770 DropletTestFiles/ten.. https://support.10xg..      tar.gz
## EH3771 DropletTestFiles/ten.. https://support.10xg..        HDF5
## EH3772 DropletTestFiles/ten.. https://support.10xg..        HDF5
##                 file.dataset file.version              file.name
##                  <character>  <character>            <character>
## EH3685 tenx-2.0.1-nuclei_900        1.0.0            mol_info.h5
## EH3686 tenx-2.0.1-nuclei_900        1.0.0            filtered.h5
## EH3687 tenx-2.0.1-nuclei_900        1.0.0                 raw.h5
## EH3688 tenx-2.0.1-nuclei_900        1.0.0        filtered.tar.gz
## EH3689 tenx-2.0.1-nuclei_900        1.0.0             raw.tar.gz
## ...                      ...          ...                    ...
## EH3732 bach-mammary-swapping        1.0.0 hiseq_4000/mol_info_..
## EH3769     tenx-2.1.0-pbmc4k        1.0.0             raw.tar.gz
## EH3770     tenx-2.1.0-pbmc4k        1.0.0        filtered.tar.gz
## EH3771     tenx-2.1.0-pbmc4k        1.0.0                 raw.h5
## EH3772     tenx-2.1.0-pbmc4k        1.0.0            mol_info.h5

The second is to actually obtain a resource. This is provided in the form of a (read-only!) path on which further operations can be applied.

getTestFile(out$rdatapath[1], prefix=FALSE)
##                                                     EH3685 
## "/home/biocbuild/.cache/R/ExperimentHub/135d12440467_3721"

Currently, all of the files come from 10X Genomics datasets. As such, we will see a lot of filtered/raw count matrices, molecule information files and HDF5 barcode matrices. We refer readers to the (relevant section)[https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/overview] of the 10X Genomics website for more details.

3 Example

Here, we obtain a path to a filtered HDF5 matrix and read it in with a DropletUtils function. This produces a SingleCellExperiment object for use in various Bioconductor pipelines.

library(DropletUtils)
path <- getTestFile("tenx-3.1.0-5k_pbmc_protein_v3/1.0.0/filtered.h5", prefix=TRUE)
sce <- read10xCounts(path, type="HDF5")
sce
## class: SingleCellExperiment 
## dim: 33570 5247 
## metadata(1): Samples
## assays(1): counts
## rownames(33570): ENSG00000243485 ENSG00000237613 ... IgG2a IgG2b
## rowData names(3): ID Symbol Type
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Session information

sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DropletUtils_1.24.0         SingleCellExperiment_1.26.0
##  [3] SummarizedExperiment_1.34.0 Biobase_2.64.0             
##  [5] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0        
##  [7] IRanges_2.38.0              S4Vectors_0.42.0           
##  [9] BiocGenerics_0.50.0         MatrixGenerics_1.16.0      
## [11] matrixStats_1.3.0           DropletTestFiles_1.14.0    
## [13] BiocStyle_2.32.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1          dplyr_1.1.4              
##  [3] blob_1.2.4                R.utils_2.12.3           
##  [5] filelock_1.0.3            Biostrings_2.72.0        
##  [7] fastmap_1.1.1             BiocFileCache_2.12.0     
##  [9] digest_0.6.35             mime_0.12                
## [11] lifecycle_1.0.4           statmod_1.5.0            
## [13] KEGGREST_1.44.0           RSQLite_2.3.6            
## [15] magrittr_2.0.3            compiler_4.4.0           
## [17] rlang_1.1.3               sass_0.4.9               
## [19] tools_4.4.0               utf8_1.2.4               
## [21] yaml_2.3.8                knitr_1.46               
## [23] dqrng_0.3.2               S4Arrays_1.4.0           
## [25] bit_4.0.5                 curl_5.2.1               
## [27] DelayedArray_0.30.0       abind_1.4-5              
## [29] BiocParallel_1.38.0       HDF5Array_1.32.0         
## [31] withr_3.0.0               purrr_1.0.2              
## [33] R.oo_1.26.0               grid_4.4.0               
## [35] fansi_1.0.6               ExperimentHub_2.12.0     
## [37] beachmat_2.20.0           edgeR_4.2.0              
## [39] Rhdf5lib_1.26.0           cli_3.6.2                
## [41] rmarkdown_2.26            crayon_1.5.2             
## [43] generics_0.1.3            httr_1.4.7               
## [45] DelayedMatrixStats_1.26.0 scuttle_1.14.0           
## [47] rhdf5_2.48.0              DBI_1.2.2                
## [49] cachem_1.0.8              zlibbioc_1.50.0          
## [51] parallel_4.4.0            AnnotationDbi_1.66.0     
## [53] BiocManager_1.30.22       XVector_0.44.0           
## [55] vctrs_0.6.5               Matrix_1.7-0             
## [57] jsonlite_1.8.8            bookdown_0.39            
## [59] bit64_4.0.5               locfit_1.5-9.9           
## [61] limma_3.60.0              jquerylib_0.1.4          
## [63] glue_1.7.0                codetools_0.2-20         
## [65] BiocVersion_3.19.1        UCSC.utils_1.0.0         
## [67] tibble_3.2.1              pillar_1.9.0             
## [69] rhdf5filters_1.16.0       rappdirs_0.3.3           
## [71] htmltools_0.5.8.1         GenomeInfoDbData_1.2.12  
## [73] R6_2.5.1                  dbplyr_2.5.0             
## [75] sparseMatrixStats_1.16.0  evaluate_0.23            
## [77] lattice_0.22-6            AnnotationHub_3.12.0     
## [79] R.methodsS3_1.8.2         png_0.1-8                
## [81] memoise_2.0.1             bslib_0.7.0              
## [83] Rcpp_1.0.12               SparseArray_1.4.0        
## [85] xfun_0.43                 pkgconfig_2.0.3