Using Bioconductor for Microarray Analysis

Bioconductor has advanced facilities for analysis of microarray platforms including Affymetrix, Illumina, Nimblegen, Agilent, and other one- and two-color technologies.

Bioconductor includes extensive support for analysis of expression arrays, and well-developed support for exon, copy number, SNP, methylation, and other assays.

Major workflows in Bioconductor include pre-processing, quality assessment, differential expression, clustering and classification, gene set enrichment analysis, and genetical genomics.

Bioconductor offers extensive interfaces to community resources, including GEO, ArrayExpress, Biomart, genome browsers, GO, KEGG, and diverse annotation sources.

Sample Workflow

The following psuedo-code illustrates a typical R / Bioconductor session. It uses RMA from the affy package to pre-process Affymetrix arrays, and the limma package for assessing differential expression.

## Load packages
> library(affy)   # Affymetrix pre-processing
> library(limma)  # two-color pre-processing; differential
                  # expression

## import "phenotype" data, describing the experimental design
> phenoData <- read.AnnotatedDataFrame("sample-description.csv")

## RMA normalization
> eset <- justRMA("/celfile-directory", phenoData=phenoData)

## differential expression
> design <-                   # describe model to be fit
      model.matrix(~ Disease, pData(eset))
> fit <- lmFit(eset, design)  # fit each probeset to model
> efit <- eBayes(fit)        # empirical Bayes adjustment
> topTable(efit, coef=2)      # table of differentially expressed probesets

A top table resulting from a more complete analysis, described in Chapter 7 of Bioconductor Case Studies, is shown below. The table enumerates Affymetrix probes, the log-fold difference between two experimental groups, the average expression across all samples, the t-statistic describing differential expression, the unadjusted and adjusted (controlling for false discovery rate, in this case) significance of the difference, and log-odds ratio. These results can be used in further analysis and annotation.

      ID logFC AveExpr    t  P.Value adj.P.Val     B
636_g_at  1.10    9.20 9.03 4.88e-14  1.23e-10 21.29
39730_at  1.15    9.00 8.59 3.88e-13  4.89e-10 19.34
 1635_at  1.20    7.90 7.34 1.23e-10  1.03e-07 13.91
 1674_at  1.43    5.00 7.05 4.55e-10  2.87e-07 12.67
40504_at  1.18    4.24 6.66 2.57e-09  1.30e-06 11.03
40202_at  1.78    8.62 6.39 8.62e-09  3.63e-06  9.89
37015_at  1.03    4.33 6.24 1.66e-08  6.00e-06  9.27
32434_at  1.68    4.47 5.97 5.38e-08  1.70e-05  8.16
37027_at  1.35    8.44 5.81 1.10e-07  3.08e-05  7.49
37403_at  1.12    5.09 5.48 4.27e-07  1.08e-04  6.21

[ Back to top ]

Installation and Use

Follow installation instructions to start using these packages. The affy and limma packages are part of the core Bioconductor packages, and are installed automatically with

> source("")
> biocLite()

To install additional packages, such as the annotations associated with the Affymetrix Human Genome U95A 2.0, use

> source("")
> biocLite("hgu95av2.db")

Package installation is required only once per R installation. View a /packagesfull list of available packages.

To use the affy and limma packages, evaluate the commands

> library("affy")
> library("limma")

These commands are required once in each R session.

[ Back to top ]

Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

> help(package="limma")
> ?topTable

to obtain an overview of help on the limma package, and the topTable function, and

> browseVignettes(package="limma")

to view vignettes (providing a more comprehensive introduction to package functionality) in the limma package. Use

> help.start()

to open a web page containing comprehensive help resources.

[ Back to top ]

Pre-Processing Resources

The following provide a brief overview of packages useful for pre-processing. More comprehensive workflows can be found in documentation (available from package descriptions) and in Bioconductor Books and monographs.

Affymetrix 3'-biased Arrays

affy, gcrma, affyPLM


Affymetrix Exon ST Arrays




Affymetrix Gene ST Arrays



Affymetrix SNP Arrays


Affymetrix Tiling Arrays


Nimblegen Arrays


Illumina Expression Microarrays



[ Back to top ]

Fred Hutchinson Cancer Research Center