TileDBArray 1.19.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.5029226 0.2795343 -1.1174754 . -1.05765227 0.80648732
## [2,] 0.6605006 0.4404537 0.3002854 . -0.80743423 -0.06326119
## [3,] -2.2585540 -1.4153997 1.1201288 . 0.09084193 0.51001044
## [4,] 0.4500907 1.4094739 -0.6230997 . -1.05155396 -0.34184159
## [5,] -0.3911114 -1.4289244 0.7350842 . -1.59393417 0.15157216
## ... . . . . . .
## [96,] 1.3284270 0.4537945 0.4299209 . -0.9129257 -2.0553184
## [97,] 0.1996914 -0.7085355 2.1373539 . -0.6355262 -1.2089193
## [98,] -0.6552667 -0.4201838 0.9168518 . -0.6758182 -0.1172362
## [99,] 0.7755746 -0.8197576 1.1433738 . 2.1666176 0.3995842
## [100,] -0.1288205 0.4199151 -1.4115732 . 0.4004009 1.9123655
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.5029226 0.2795343 -1.1174754 . -1.05765227 0.80648732
## [2,] 0.6605006 0.4404537 0.3002854 . -0.80743423 -0.06326119
## [3,] -2.2585540 -1.4153997 1.1201288 . 0.09084193 0.51001044
## [4,] 0.4500907 1.4094739 -0.6230997 . -1.05155396 -0.34184159
## [5,] -0.3911114 -1.4289244 0.7350842 . -1.59393417 0.15157216
## ... . . . . . .
## [96,] 1.3284270 0.4537945 0.4299209 . -0.9129257 -2.0553184
## [97,] 0.1996914 -0.7085355 2.1373539 . -0.6355262 -1.2089193
## [98,] -0.6552667 -0.4201838 0.9168518 . -0.6758182 -0.1172362
## [99,] 0.7755746 -0.8197576 1.1433738 . 2.1666176 0.3995842
## [100,] -0.1288205 0.4199151 -1.4115732 . 0.4004009 1.9123655
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.5029226 0.2795343 -1.1174754 . -1.05765227 0.80648732
## GENE_2 0.6605006 0.4404537 0.3002854 . -0.80743423 -0.06326119
## GENE_3 -2.2585540 -1.4153997 1.1201288 . 0.09084193 0.51001044
## GENE_4 0.4500907 1.4094739 -0.6230997 . -1.05155396 -0.34184159
## GENE_5 -0.3911114 -1.4289244 0.7350842 . -1.59393417 0.15157216
## ... . . . . . .
## GENE_96 1.3284270 0.4537945 0.4299209 . -0.9129257 -2.0553184
## GENE_97 0.1996914 -0.7085355 2.1373539 . -0.6355262 -1.2089193
## GENE_98 -0.6552667 -0.4201838 0.9168518 . -0.6758182 -0.1172362
## GENE_99 0.7755746 -0.8197576 1.1433738 . 2.1666176 0.3995842
## GENE_100 -0.1288205 0.4199151 -1.4115732 . 0.4004009 1.9123655
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.5029226 0.6605006 -2.2585540 0.4500907 -0.3911114 1.4630315
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.50292259 0.27953435 -1.11747540 0.84509498 0.04432742
## GENE_2 0.66050062 0.44045366 0.30028535 -0.75326481 0.65886768
## GENE_3 -2.25855398 -1.41539971 1.12012884 0.24440943 -1.19602661
## GENE_4 0.45009070 1.40947389 -0.62309974 0.22196074 0.29645459
## GENE_5 -0.39111141 -1.42892439 0.73508416 1.07702945 0.65322028
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.0058452 0.5590687 -2.2349508 . -2.1153045 1.6129746
## GENE_2 1.3210012 0.8809073 0.6005707 . -1.6148685 -0.1265224
## GENE_3 -4.5171080 -2.8307994 2.2402577 . 0.1816839 1.0200209
## GENE_4 0.9001814 2.8189478 -1.2461995 . -2.1031079 -0.6836832
## GENE_5 -0.7822228 -2.8578488 1.4701683 . -3.1878683 0.3031443
## ... . . . . . .
## GENE_96 2.6568539 0.9075891 0.8598419 . -1.8258514 -4.1106368
## GENE_97 0.3993827 -1.4170710 4.2747077 . -1.2710524 -2.4178386
## GENE_98 -1.3105334 -0.8403676 1.8337036 . -1.3516363 -0.2344724
## GENE_99 1.5511492 -1.6395153 2.2867475 . 4.3332353 0.7991684
## GENE_100 -0.2576410 0.8398302 -2.8231465 . 0.8008018 3.8247310
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 1.019842 6.926170 5.162767 5.139662 -1.946053 5.857144 -15.321054
## SAMP_8 SAMP_9 SAMP_10
## 19.139586 -7.017189 -6.097486
out %*% runif(ncol(out))
## [,1]
## GENE_1 -4.255333186
## GENE_2 0.675203279
## GENE_3 -0.321983079
## GENE_4 0.709440745
## GENE_5 -2.553668850
## GENE_6 2.900248877
## GENE_7 1.394783074
## GENE_8 -3.158654166
## GENE_9 0.240765817
## GENE_10 -3.394587310
## GENE_11 0.572047185
## GENE_12 -0.711132766
## GENE_13 0.940198202
## GENE_14 -1.575637016
## GENE_15 3.503030538
## GENE_16 -1.876567860
## GENE_17 -0.291992268
## GENE_18 -0.072022715
## GENE_19 0.023515560
## GENE_20 1.963875734
## GENE_21 -0.811794706
## GENE_22 -3.320882056
## GENE_23 -1.128067941
## GENE_24 -0.800963728
## GENE_25 1.011685638
## GENE_26 2.046343557
## GENE_27 0.002993832
## GENE_28 2.562899416
## GENE_29 -0.417579453
## GENE_30 1.393093527
## GENE_31 1.707774050
## GENE_32 -4.381524528
## GENE_33 -3.657352026
## GENE_34 -1.978554843
## GENE_35 -0.334413405
## GENE_36 -0.731217285
## GENE_37 -0.678323486
## GENE_38 0.573349299
## GENE_39 -1.288967443
## GENE_40 -1.875933801
## GENE_41 -0.793803938
## GENE_42 0.837754538
## GENE_43 -0.396012515
## GENE_44 -0.002044797
## GENE_45 -7.047787989
## GENE_46 1.938103529
## GENE_47 -1.373206269
## GENE_48 1.085015872
## GENE_49 0.715950692
## GENE_50 1.250761038
## GENE_51 0.172851637
## GENE_52 0.355728927
## GENE_53 1.637400004
## GENE_54 1.336756933
## GENE_55 -0.447876012
## GENE_56 -1.108035152
## GENE_57 0.215157089
## GENE_58 -2.361747996
## GENE_59 0.453957787
## GENE_60 0.261890438
## GENE_61 -0.232403769
## GENE_62 -2.517925407
## GENE_63 -1.305535400
## GENE_64 1.388292460
## GENE_65 0.793278344
## GENE_66 0.655253650
## GENE_67 -1.918134664
## GENE_68 2.453645557
## GENE_69 -0.354412303
## GENE_70 1.913231017
## GENE_71 -3.854659565
## GENE_72 0.663124426
## GENE_73 -2.943359108
## GENE_74 0.636007926
## GENE_75 1.204570108
## GENE_76 0.959745606
## GENE_77 -1.081900515
## GENE_78 0.805206493
## GENE_79 2.570371017
## GENE_80 1.189029790
## GENE_81 -0.408057782
## GENE_82 -1.616103624
## GENE_83 -3.028856479
## GENE_84 -1.658869459
## GENE_85 2.110886723
## GENE_86 2.944310528
## GENE_87 -1.796883288
## GENE_88 0.883174873
## GENE_89 4.201502294
## GENE_90 1.677754033
## GENE_91 -2.215530461
## GENE_92 0.358188662
## GENE_93 1.038062565
## GENE_94 2.148496192
## GENE_95 -0.369456681
## GENE_96 -1.337923874
## GENE_97 -0.711654652
## GENE_98 1.762318805
## GENE_99 3.807340268
## GENE_100 0.118505232
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.32940167 -0.02315358 -0.09211663 . -1.2203363 -1.4215560
## [2,] -0.20694996 1.19727557 0.79661369 . 0.3431441 0.1376661
## [3,] -1.16712074 -0.84054039 -0.01221719 . 1.1865357 0.3759659
## [4,] -0.95636227 -0.39385064 -0.57266126 . -0.4389642 0.7415227
## [5,] -0.44769392 -2.07339515 1.10614091 . -1.5069016 -0.1576369
## ... . . . . . .
## [96,] 0.22579940 1.05389008 0.29735501 . 1.08077201 0.36386480
## [97,] 0.36398434 -0.33176480 -0.66979207 . 0.31196456 0.02330395
## [98,] -0.04696153 -0.28727045 -0.50685252 . 1.48627245 -1.62809354
## [99,] 0.03714996 -2.95184438 0.76128772 . 1.42641861 -0.89208645
## [100,] -0.33988639 -1.90971295 -1.28354045 . -1.41772743 1.27806832
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.32940167 -0.02315358 -0.09211663 . -1.2203363 -1.4215560
## [2,] -0.20694996 1.19727557 0.79661369 . 0.3431441 0.1376661
## [3,] -1.16712074 -0.84054039 -0.01221719 . 1.1865357 0.3759659
## [4,] -0.95636227 -0.39385064 -0.57266126 . -0.4389642 0.7415227
## [5,] -0.44769392 -2.07339515 1.10614091 . -1.5069016 -0.1576369
## ... . . . . . .
## [96,] 0.22579940 1.05389008 0.29735501 . 1.08077201 0.36386480
## [97,] 0.36398434 -0.33176480 -0.66979207 . 0.31196456 0.02330395
## [98,] -0.04696153 -0.28727045 -0.50685252 . 1.48627245 -1.62809354
## [99,] 0.03714996 -2.95184438 0.76128772 . 1.42641861 -0.89208645
## [100,] -0.33988639 -1.90971295 -1.28354045 . -1.41772743 1.27806832
sessionInfo()
## R version 4.5.0 RC (2025-04-03 r88103 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.19.0 DelayedArray_0.35.0
## [4] SparseArray_1.9.0 S4Arrays_1.9.0 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1