TileDBArray 1.13.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.171393216 0.651766368 0.287663858 . -2.0810851 0.5531883
## [2,] 1.162554053 0.432999103 0.683060225 . -0.5732036 -0.6592048
## [3,] -0.898104246 0.345916143 -2.517565897 . 1.5935508 -1.1446356
## [4,] 0.926451690 0.481835536 -0.546165241 . -0.3579166 -0.8691735
## [5,] 0.000626085 -1.221147384 -0.697120739 . 0.2540500 -1.2594983
## ... . . . . . .
## [96,] 1.4057020 -0.4852564 -0.4335200 . -0.3621997 -2.2009791
## [97,] -0.8262886 0.7009068 0.4926384 . 0.2607656 -0.2884421
## [98,] -0.2348036 1.6188076 0.4156043 . -2.9608127 1.6341858
## [99,] -2.1552701 -0.7552007 1.2298855 . -0.3441050 -0.2479900
## [100,] 0.5205651 2.3486854 1.1384410 . -0.3516320 1.0338206
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.171393216 0.651766368 0.287663858 . -2.0810851 0.5531883
## [2,] 1.162554053 0.432999103 0.683060225 . -0.5732036 -0.6592048
## [3,] -0.898104246 0.345916143 -2.517565897 . 1.5935508 -1.1446356
## [4,] 0.926451690 0.481835536 -0.546165241 . -0.3579166 -0.8691735
## [5,] 0.000626085 -1.221147384 -0.697120739 . 0.2540500 -1.2594983
## ... . . . . . .
## [96,] 1.4057020 -0.4852564 -0.4335200 . -0.3621997 -2.2009791
## [97,] -0.8262886 0.7009068 0.4926384 . 0.2607656 -0.2884421
## [98,] -0.2348036 1.6188076 0.4156043 . -2.9608127 1.6341858
## [99,] -2.1552701 -0.7552007 1.2298855 . -0.3441050 -0.2479900
## [100,] 0.5205651 2.3486854 1.1384410 . -0.3516320 1.0338206
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.0 0.0
## [997,] 0 0 0 . 0.0 0.0
## [998,] 0 0 0 . 0.0 0.0
## [999,] 0 0 0 . 0.0 0.0
## [1000,] 0 0 0 . 0.0 1.3
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE TRUE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.171393216 0.651766368 0.287663858 . -2.0810851 0.5531883
## GENE_2 1.162554053 0.432999103 0.683060225 . -0.5732036 -0.6592048
## GENE_3 -0.898104246 0.345916143 -2.517565897 . 1.5935508 -1.1446356
## GENE_4 0.926451690 0.481835536 -0.546165241 . -0.3579166 -0.8691735
## GENE_5 0.000626085 -1.221147384 -0.697120739 . 0.2540500 -1.2594983
## ... . . . . . .
## GENE_96 1.4057020 -0.4852564 -0.4335200 . -0.3621997 -2.2009791
## GENE_97 -0.8262886 0.7009068 0.4926384 . 0.2607656 -0.2884421
## GENE_98 -0.2348036 1.6188076 0.4156043 . -2.9608127 1.6341858
## GENE_99 -2.1552701 -0.7552007 1.2298855 . -0.3441050 -0.2479900
## GENE_100 0.5205651 2.3486854 1.1384410 . -0.3516320 1.0338206
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.171393216 1.162554053 -0.898104246 0.926451690 0.000626085 -1.085134922
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.171393216 0.651766368 0.287663858 -0.457509749 0.870412506
## GENE_2 1.162554053 0.432999103 0.683060225 -1.045164546 -0.696327283
## GENE_3 -0.898104246 0.345916143 -2.517565897 0.021925625 -1.152002340
## GENE_4 0.926451690 0.481835536 -0.546165241 2.102811997 1.198848318
## GENE_5 0.000626085 -1.221147384 -0.697120739 -0.543905999 -0.585496400
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.34278643 1.30353274 0.57532772 . -4.1621702 1.1063766
## GENE_2 2.32510811 0.86599821 1.36612045 . -1.1464073 -1.3184097
## GENE_3 -1.79620849 0.69183229 -5.03513179 . 3.1871017 -2.2892712
## GENE_4 1.85290338 0.96367107 -1.09233048 . -0.7158333 -1.7383470
## GENE_5 0.00125217 -2.44229477 -1.39424148 . 0.5081001 -2.5189966
## ... . . . . . .
## GENE_96 2.8114040 -0.9705128 -0.8670400 . -0.7243994 -4.4019582
## GENE_97 -1.6525772 1.4018135 0.9852769 . 0.5215313 -0.5768842
## GENE_98 -0.4696072 3.2376152 0.8312086 . -5.9216255 3.2683717
## GENE_99 -4.3105403 -1.5104013 2.4597709 . -0.6882101 -0.4959800
## GENE_100 1.0411302 4.6973707 2.2768820 . -0.7032640 2.0676413
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -3.757147 -12.704008 -11.533708 -9.180058 1.212252 17.264322 -1.538808
## SAMP_8 SAMP_9 SAMP_10
## -8.079337 -19.761577 12.678101
out %*% runif(ncol(out))
## [,1]
## GENE_1 -1.18222355
## GENE_2 0.38767840
## GENE_3 -1.89816101
## GENE_4 0.65898001
## GENE_5 -1.38976471
## GENE_6 -3.09649947
## GENE_7 0.49409527
## GENE_8 -0.87261997
## GENE_9 0.09199060
## GENE_10 0.73178859
## GENE_11 -2.24999852
## GENE_12 -1.13843768
## GENE_13 -2.78717715
## GENE_14 -3.26941311
## GENE_15 2.92163885
## GENE_16 -0.80709303
## GENE_17 -2.49945511
## GENE_18 1.64173936
## GENE_19 -1.47715820
## GENE_20 -2.26278771
## GENE_21 -1.97974769
## GENE_22 2.83437763
## GENE_23 1.12503409
## GENE_24 -1.75263911
## GENE_25 -2.54887517
## GENE_26 -1.21456254
## GENE_27 1.05830591
## GENE_28 -0.96297037
## GENE_29 4.07050861
## GENE_30 2.49874899
## GENE_31 1.99831424
## GENE_32 -0.47484932
## GENE_33 1.63745249
## GENE_34 2.03443387
## GENE_35 -3.26675467
## GENE_36 1.33044913
## GENE_37 3.95903513
## GENE_38 -0.82855906
## GENE_39 0.90711267
## GENE_40 -1.18239083
## GENE_41 -0.58679735
## GENE_42 1.78948204
## GENE_43 1.47447154
## GENE_44 -0.65746667
## GENE_45 0.53621406
## GENE_46 -0.18854002
## GENE_47 -0.60624792
## GENE_48 1.60279240
## GENE_49 -0.85184520
## GENE_50 2.22149918
## GENE_51 -0.41078056
## GENE_52 -0.36405510
## GENE_53 -1.33073901
## GENE_54 -2.22808539
## GENE_55 0.71801058
## GENE_56 -1.72599539
## GENE_57 0.59069992
## GENE_58 -2.06341635
## GENE_59 -1.14962494
## GENE_60 0.57476113
## GENE_61 -0.40807744
## GENE_62 -1.89674107
## GENE_63 -0.31311348
## GENE_64 -1.11571717
## GENE_65 -3.00743601
## GENE_66 2.25958954
## GENE_67 -0.09607566
## GENE_68 0.10612083
## GENE_69 -3.01851040
## GENE_70 2.35753535
## GENE_71 2.59690333
## GENE_72 -0.70941177
## GENE_73 -0.10037942
## GENE_74 -0.52016225
## GENE_75 -0.64365880
## GENE_76 0.07538578
## GENE_77 -3.08358059
## GENE_78 -0.40969995
## GENE_79 -1.08437254
## GENE_80 1.09719198
## GENE_81 -0.40229685
## GENE_82 1.82089664
## GENE_83 0.80856225
## GENE_84 1.14614433
## GENE_85 -1.93098296
## GENE_86 -1.04344885
## GENE_87 0.55653994
## GENE_88 -3.95774607
## GENE_89 1.85752310
## GENE_90 -0.63737358
## GENE_91 0.24032650
## GENE_92 -2.18279389
## GENE_93 -1.82014434
## GENE_94 -1.24390531
## GENE_95 -0.88732981
## GENE_96 1.02634199
## GENE_97 1.56163559
## GENE_98 -0.91646580
## GENE_99 -0.95700277
## GENE_100 0.70324912
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.6499921 0.7191018 -0.4610494 . 1.0738667 -1.9727477
## [2,] 2.1149771 0.8833282 -0.6259983 . -0.1246814 1.2588876
## [3,] 0.7630330 -1.1478580 0.1524170 . -0.8087201 0.4010439
## [4,] -1.3109809 -0.5543099 -0.0570456 . -0.2504532 0.8056661
## [5,] 0.2694267 -0.3760641 1.5485601 . 1.7169928 0.1063153
## ... . . . . . .
## [96,] 1.3558123 0.2085069 -1.7854568 . 0.5563003 -0.1859479
## [97,] 0.5074377 1.6122118 0.4795615 . -0.4855103 1.0648663
## [98,] 0.8302764 1.2469841 0.5615397 . 0.8128571 1.6509452
## [99,] -0.2883375 -0.1753277 0.8472279 . 0.3500369 1.8445131
## [100,] 0.5686487 -0.0525962 1.7097535 . 0.2801740 -0.3216694
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.6499921 0.7191018 -0.4610494 . 1.0738667 -1.9727477
## [2,] 2.1149771 0.8833282 -0.6259983 . -0.1246814 1.2588876
## [3,] 0.7630330 -1.1478580 0.1524170 . -0.8087201 0.4010439
## [4,] -1.3109809 -0.5543099 -0.0570456 . -0.2504532 0.8056661
## [5,] 0.2694267 -0.3760641 1.5485601 . 1.7169928 0.1063153
## ... . . . . . .
## [96,] 1.3558123 0.2085069 -1.7854568 . 0.5563003 -0.1859479
## [97,] 0.5074377 1.6122118 0.4795615 . -0.4855103 1.0648663
## [98,] 0.8302764 1.2469841 0.5615397 . 0.8128571 1.6509452
## [99,] -0.2883375 -0.1753277 0.8472279 . 0.3500369 1.8445131
## [100,] 0.5686487 -0.0525962 1.7097535 . 0.2801740 -0.3216694
sessionInfo()
## R version 4.4.0 Patched (2024-04-24 r86482)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.6.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.16 TileDBArray_1.13.0 DelayedArray_0.29.9
## [4] SparseArray_1.3.7 S4Arrays_1.3.7 abind_1.4-5
## [7] IRanges_2.37.1 S4Vectors_0.41.7 MatrixGenerics_1.15.1
## [10] matrixStats_1.3.0 BiocGenerics_0.49.1 Matrix_1.7-0
## [13] BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.43.1 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.49.3 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1