
1 Introduction

BioCarta is a valuable source of biological pathways which not only provides well manually curated pathways, but also remarkable and intuitive pathway images. One useful features of pathway analysis which is to highlight genes of interest on the pathway images is lost. Since the original source of BioCarta ( is lost from the internet, we digged out the data from the internet archive and formatted it into a package.

2 Preprocessing

The BioCarta data is collected from This is an archive of BioCarta’s successor website which is also retired from internet. The snapshot was taken on 2017-01-22. The script is also shipped in the package:

system.file("script", "process.R", package = "BioCartaImage")
The core data of this package is the coordinates of proteins in the pathway images. This information is included in the HTML code (in the <map>/<area> tags) of the web page of a certain pathway. We use the rvest package to extract such information.

3 Get pathways

The total pathways in the BioCarta database:

ap = all_pathways()
## [1] 314
## [1] "h_RELAPathway"    "h_no1Pathway"     "h_gsPathway"      "h_CSKPathway"    
## [5] "h_pkcPathway"     "h_srcRPTPPathway"

A single pathway can be obtained by providing the pathway ID. It prints two numbers:

p = get_pathway("h_RELAPathway")
## A BioCarta pathway:
##   ID: h_RELAPathway
##   Name: Acetylation and Deacetylation of RelA in The Nucleus
##   35 nodes, 16 genes

MSigDB is also a popular resource for BioCarta pathway analysis. Here we also support MSigSB IDs for the BioCarta pathways. The MSigDB ID is very similar to the original BioCarta ID:

## A BioCarta pathway:
##   ID: h_RELAPathway
##   Name: Acetylation and Deacetylation of RelA in The Nucleus
##   35 nodes, 16 genes

The pathway object p is actually a very simple list which contains coordinates of member nodes.

As the users, they do not need to touch the internal part of p, but the elements in the list are explained as follows:

As we have already explained in the previous text, the basic units in pathways are proteins/nodes, while not directly genes. Thus, the so-called “bc_id” is used as the primary ID in the package. However, for users, they do not need to touch all these details. They just directly interact with genes and pathways, the mapping from genes to “bc_ids” and then to pathways is done automatically in the package.

Similar as many other packages which contain BioCarta gene sets, the member genes of a pathway can be obtained by genes_in_pathway(). You can provide the pathway ID or the pathway object. The EntreZ ID is used as the gene ID type.

##  [1] "1387" "8772" "8841" "4792" "1147" "3551" "8517" "4790" "2033" "5970"
## [11] "8737" "7124" "7132" "7133" "8717" "7189"
4 Plot the pathway

Next, let’s move to the main functionality of this package: customizing the pathway.

First, as many other grid plotting functions, grid.biocarta() draws a pathway (where the pathway image is imported as a raster object internally).

grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))

You can specify the location and how the image is aligned to the anchor point.

    x = unit(0.2, "npc"), y = unit(0.9, "npc"),
    just = c("left", "top"),
    color = c("1387" = "yellow"),
    width = unit(6, "cm"))

You can also first create a viewport, then draw the pathway inside it.

pushViewport(viewport(width = 0.7, height = 0.5))
grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))

As the aspect ratio of the image is fixed, you can either set width or height. If both are set, the size of the image is internally adjusted to let the image maximally fill the plotting region.

One of the main use of the pathway image is to highlight genes of interest. The simple use is to set the color argument which is a named vector where gene EntreZ ID are names. When the colors are set, the genes are highlighted with dashed colored borders.

grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))

As normally BioCarta pathway images are colorful, it is quite difficult to find a proper color to be distinguished from other genes. There is a more flexible way in the package which allows to add self-defined graphics over or besides the genes.

To edit the pathway image, we create the pathway grob first (“grob” is short for “graphic object”).

grob = biocartaGrob("h_RELAPathway")

The object grob basically contains a viewport and a raster image object. Later we can add more graphics for single genes to it.

Graphics for single genes are added by the function mark_gene(). You need to provide the pathway grob, the gene EntreZ ID and a self-defined graphics function. As you can imagine, the input of the function is the coordinate of the polygon of the gene in forms of two vectors: the x-coordinates and the y-coordinates.

There are two ways to implement the graphics function. First, the function directly returns a grob object. Later this grob is inserted to the global pathway grob.

There is a helper function pos_by_polygon() which returns the position of a certain side of the polygon.

In the following code, we add a yellow point to the left side of gene “1387” (CBP in the image).

The graphics are drawn in the pathway image viewport which already has a coordinate system associated. the “xscale” and “yscale” correspond to the numbers of pixels horizontally and vertically. So unit(1, "native") means 1 pixel in the original image.

grob2 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y, where = "left")
    pointsGrob(pos[1], pos[2], default.units = "native",
        pch = 16, gp = gpar(col = "yellow"))

If you have complicated graphics, you can consider to use gTree() and gList() to combine them.

If you are not familiar with gTree() and gList() or *Grob() functions. You can directly use the grid plotting functions such as grid.points() or grid.lines(). In this case, you have to set capture to TRUE, then the graphics will be captured as grobs internally.

grob3 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y, where = "left")
    grid.points(pos[1], pos[2], default.units = "native",
        pch = 16, gp = gpar(col = "yellow"))
}, capture = TRUE)

With this functionality, you can implement complicated graphics to associate a gene. In the following example, we create a viewport and put it to the left of the gene.

grob4 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y)
    pushViewport(viewport(x = pos[1] - 10, y = pos[2], 
        width = unit(4, "cm"), height = unit(4, "cm"), 
        default.units = "native", just = "right"))
    grid.rect(gp = gpar(fill = "red"))
    grid.text("add whatever\nyou want here")
}, capture = TRUE)

