Contents

1 Introduction

BioCarta is a valuable source of biological pathways which not only provides well manually curated pathways, but also remarkable and intuitive pathway images. One useful features of pathway analysis which is to highlight genes of interest on the pathway images is lost. Since the original source of BioCarta (biocarte.com) is lost from the internet, we digged out the data from the internet archive and formatted it into a package.

2 Preprocessing

The BioCarta data is collected from web.archive.org. This is an archive of BioCarta’s successor website cgap.nci.nih.gov which is also retired from internet. The snapshot was taken on 2017-01-22. The script is also shipped in the package:

system.file("script", "process.R", package = "BioCartaImage")
## [1] "/tmp/Rtmp5VHEQI/Rinst3a086e5c5ea619/BioCartaImage/script/process.R"

The core data of this package is the coordinates of proteins in the pathway images. This information is included in the HTML code (in the <map>/<area> tags) of the web page of a certain pathway. We use the rvest package to extract such information.

3 Get pathways

The total pathways in the BioCarta database:

library(BioCartaImage)
ap = all_pathways()
length(ap)
## [1] 314
head(ap)
## [1] "h_RELAPathway"    "h_no1Pathway"     "h_gsPathway"      "h_CSKPathway"    
## [5] "h_pkcPathway"     "h_srcRPTPPathway"

A single pathway can be obtained by providing the pathway ID. It prints two numbers:

p = get_pathway("h_RELAPathway")
p
## A BioCarta pathway:
##   ID: h_RELAPathway
##   Name: Acetylation and Deacetylation of RelA in The Nucleus
##   35 nodes, 16 genes

MSigDB is also a popular resource for BioCarta pathway analysis. Here we also support MSigSB IDs for the BioCarta pathways. The MSigDB ID is very similar to the original BioCarta ID:

# MSigDB ID
get_pathway("BIOCARTA_RELA_PATHWAY")
## A BioCarta pathway:
##   ID: h_RELAPathway
##   Name: Acetylation and Deacetylation of RelA in The Nucleus
##   35 nodes, 16 genes

The pathway object p is actually a very simple list which contains coordinates of member nodes.

str(p)
## List of 6
##  $ id        : chr "h_RELAPathway"
##  $ name      : chr "Acetylation and Deacetylation of RelA in The Nucleus"
##  $ bc        : chr [1:35] "rela" "rela" "rela" "rela" ...
##  $ shape     : chr [1:35] "poly" "poly" "poly" "poly" ...
##  $ coords    :List of 35
##   ..$ : num [1:42] 235 418 235 409 237 402 241 397 246 394 ...
##   ..$ : num [1:42] 342 304 351 303 358 305 364 308 366 314 ...
##   ..$ : num [1:42] 342 230 351 229 358 231 364 234 366 240 ...
##   ..$ : num [1:42] 333 115 342 114 349 116 355 119 357 125 ...
##   ..$ : num [1:42] 82 235 91 234 98 236 104 239 106 245 ...
##   ..$ : num [1:42] 214 126 223 125 230 127 236 130 238 136 ...
##   ..$ : num [1:40] 205 418 205 409 208 402 212 397 217 395 ...
##   ..$ : num [1:40] 362 336 353 335 346 332 341 328 339 322 ...
##   ..$ : num [1:40] 361 261 352 260 345 257 340 253 338 247 ...
##   ..$ : num [1:40] 351 145 342 144 335 141 330 137 328 131 ...
##   ..$ : num [1:40] 100 266 91 265 84 262 79 258 77 252 ...
##   ..$ : num [1:40] 235 157 226 156 219 153 214 149 212 143 ...
##   ..$ : num [1:34] 61 92 66 93 70 95 73 99 74 103 ...
##   ..$ : num [1:34] 85 123 72 121 65 124 62 124 61 122 ...
##   ..$ : num [1:18] 54 111 55 112 55 124 54 125 19 125 ...
##   ..$ : num [1:40] 227 296 218 295 211 292 206 288 204 282 ...
##   ..$ : num [1:10] 41 143 41 123 81 123 81 143 41 143
##   ..$ : num [1:42] 210 265 219 264 226 266 232 269 234 275 ...
##   ..$ : num [1:38] 55 202 71 205 75 202 80 201 84 202 ...
##   ..$ : num [1:68] 50 57 48 51 47 48 48 46 50 47 ...
##   ..$ : num [1:24] 65 19 70 21 72 23 71 25 63 44 ...
##   ..$ : num [1:34] 426 355 436 356 445 358 450 362 453 366 ...
##   ..$ : num [1:34] 391 133 401 134 410 136 415 140 418 144 ...
##   ..$ : num [1:8] 182 153 178 156 174 149 182 153
##   ..$ : num [1:12] 177 161 182 158 184 163 183 166 181 168 ...
##   ..$ : num [1:10] 168 163 162 161 168 155 171 156 168 163
##   ..$ : num [1:34] 164 156 158 162 164 164 164 168 150 170 ...
##   ..$ : num [1:12] 170 170 171 165 173 159 175 160 179 167 ...
##   ..$ : num [1:10] 173 156 162 151 159 146 170 148 173 156
##   ..$ : num [1:18] 238 362 241 365 241 375 238 378 207 378 ...
##   ..$ : num [1:34] 561 356 571 357 580 359 585 363 588 367 ...
##   ..$ : num [1:30] 324 354 346 353 350 354 354 357 346 363 ...
##   ..$ : num [1:34] 49 247 59 248 68 250 73 254 76 258 ...
##   ..$ : num [1:34] 396 249 406 250 415 252 420 256 423 260 ...
##   ..$ : num [1:18] 248 378 251 381 251 391 248 394 217 394 ...
##  $ image_file: chr "h_relaPathway.gif"
##  - attr(*, "class")= chr "biocarta_pathway"

As the users, they do not need to touch the internal part of p, but the elements in the list are explained as follows:

As we have already explained in the previous text, the basic units in pathways are proteins/nodes, while not directly genes. Thus, the so-called “bc_id” is used as the primary ID in the package. However, for users, they do not need to touch all these details. They just directly interact with genes and pathways, the mapping from genes to “bc_ids” and then to pathways is done automatically in the package.

Similar as many other packages which contain BioCarta gene sets, the member genes of a pathway can be obtained by genes_in_pathway(). You can provide the pathway ID or the pathway object. The EntreZ ID is used as the gene ID type.

genes_in_pathway("h_RELAPathway")
##  [1] "1387" "8772" "8841" "4792" "1147" "3551" "8517" "4790" "2033" "5970"
## [11] "8737" "7124" "7132" "7133" "8717" "7189"
genes_in_pathway(p)
##  [1] "1387" "8772" "8841" "4792" "1147" "3551" "8517" "4790" "2033" "5970"
## [11] "8737" "7124" "7132" "7133" "8717" "7189"

4 Plot the pathway

Next, let’s move to the main functionality of this package: customizing the pathway.

First, as many other grid plotting functions, grid.biocarta() draws a pathway (where the pathway image is imported as a raster object internally).

library(grid)
grid.newpage()
grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))