Using Bioconductor for Annotation

Bioconductor has extensive facilities for mapping between microarray probe, gene, pathway, gene ontology, homology and other annotations.

Bioconductor has built-in representations of GO, KEGG, vendor, and other annotations, and can easily access NCBI, Biomart, UCSC, and other sources.

Sample Workflow

The following psuedo-code illustrates a typical R / Bioconductor session. It continues the differential expression workflow, taking a 'top table' of differentially expressed probesets and discovering the genes probed, and the Gene Ontology pathways to which they belong.

## Affymetrix U133 2.0 array IDs of interest; these might be
## obtained from
##
##   tbl <- topTable(efit, coef=2)
##   ids <- tbl[["ID"]]
##
## as part of a more extensive workflow.
> ids <- c("39730_at", "1635_at", "1674_at", "40504_at", "40202_at")

## load libraries as sources of annotation
> library("hgu95av2.db")

## To list the kinds of things that can be retrieved, use the cols method.
> cols(hgu95av2.db)

## To list the kinds of things that can be used as keys 
## use the keytypes method
> keytypes(hgu95av2.db)

## To extract viable keys of a particular kind, use the keys method.
> head(keys(hgu95av2.db, keytype="ENTREZID"))

## the select method allows you to mao probe ids to ENTREZ gene ids...
> select(hgu95av2.db, ids, "ENTREZID", "PROBEID")
   PROBEID ENTREZID
1 39730_at       25
2  1635_at       25
3  1674_at     7525
4 40504_at     5445
5 40202_at      687

## ... and to GENENAME etc.
> select(hgu95av2.db, ids, c("ENTREZID","GENENAME"), "PROBEID")
   PROBEID ENTREZID                                           GENENAME
1 39730_at       25     c-abl oncogene 1, non-receptor tyrosine kinase
2  1635_at       25     c-abl oncogene 1, non-receptor tyrosine kinase
3  1674_at     7525 v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1
4 40504_at     5445                                      paraoxonase 2
5 40202_at      687                              Kruppel-like factor 9

## find and extract the GO ids associated with the first id
> res <- select(hgu95av2.db, ids[1], "GO", "PROBEID")
> head(res)
   PROBEID         GO EVIDENCE ONTOLOGY
1 39730_at GO:0000115      TAS       BP
2 39730_at GO:0000287      IDA       MF
3 39730_at GO:0003677      NAS       MF
4 39730_at GO:0003785      TAS       MF
5 39730_at GO:0004515      TAS       MF
6 39730_at GO:0004713      IDA       MF

## use GO.db to find the Terms associated with those GOIDs
> library("GO.db")
> head(select(GO.db, res$GO, "TERM", "GOID"))
        GOID                                                                   TERM
1 GO:0000115  regulation of transcription involved in S phase of mitotic cell cycle
2 GO:0000287                                                  magnesium ion binding
3 GO:0003677                                                            DNA binding
4 GO:0003785                                                  actin monomer binding
5 GO:0004515                     nicotinate-nucleotide adenylyltransferase activity
6 GO:0004713                                       protein tyrosine kinase activity

[ Back to top ]

Installation and Use

Follow installation instructions to start using these packages. To install the annotations associated with the Affymetrix Human Genome U95 V 2.0, and with Gene Ontology, use

> source("http://bioconductor.org/biocLite.R")
> biocLite(c("hgu95av2.db", "GO.db"))

Package installation is required only once per R installation. View a full list of available software and annotation packages.

To use the AnnotationDbi and GO.db package, evaluate the commands

> library(AnnotationDbi")
> library("GO.db")

These commands are required once in each R session.

[ Back to top ]

Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

> help(package="GO.db")
> ?select

to obtain an overview of help on the GO.db package, and the select method. The AnnotationDbi package is used by most .db packages. View the vignettes in the AnnotationDbi package with

> browseVignettes(package="AnnotationDbi")

To view vignettes (providing a more comprehensive introduction to package functionality) in the AnnotationDbi package. Use

> help.start()

To open a web page containing comprehensive help resources.

[ Back to top ]

Annotation Resources

The following guides the user through key annotation packages. Users interested in how to create custom chip packages should see the vignettes in the AnnotationForge package. There is additional information in the AnnotationDbi, OrganismDbi and GenomicFeatures packages for how to use some of the extra tools provided. You can also refer to the complete list of annotation packages.

Key Packages

Types of Annotation Packages

[ Back to top ]

Fred Hutchinson Cancer Research Center