Skip to content.

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.



% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % % \VignetteIndexEntry{EMBO03 Lab 4} %\VignetteDepends{affy} %\VignetteKeywords{Microarray, Pre-processing} \documentclass[12pt]{article}

\usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref}

\textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle}


\title{EMBO03 Lab4: Introduction to Bioconductor {\tt affy} Package}

\author{Sandrine Dudoit, Robert Gentleman, and Rafael Irizarry}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document}


In this lab, we demonstrate the main functions in the \verb+affy+ package for pre-processing Affymetrix microarray data. For a more detailed introduction, consult the package vignettes which can be listed by the command {\tt openVignette("affy")}. A demo can also be accessed by {\tt demo(affy)}. A number of sample datasets are available in the package; to list these, type {\tt data(package="affy")}. To load the package <>= library(affy) @

The function \verb+ReadAffy+ is available for reading CEL files. However, in this lab we will work mainly with the \verb+Dilution+ dataset, which is included in the package. For a description of \verb+Dilution+, type {\tt ? Dilution}. To load this dataset <>= data(Dilution) @

%%%%%%%%%%%%%%%%%%%%%%%%% %%% affy classes

One of the main classes in \verb+affy+ is the \verb+AffyBatch+ class. For details on this class consult the help file, {\tt ? AffyBatch}; methods for manipulating instances of this class are also described in the help file. Other classes include \verb+ProbeSet+ (PM and MM intensities for individual probe sets), \verb+Cdf+ (information contained in a CDF file), and \verb+Cel+ (single array cel intensity data). The object \verb+Dilution+ is an instance of the class \verb+AffyBatch+. Try the following commands to obtain information on this object <>= class(Dilution) slotNames(Dilution) Dilution annotation(Dilution) @

For a description of the target samples hybridized to the arrays <>= phenoData(Dilution) pData(Dilution) @

The \verb+exprs+ slot contains a matrix with columns corresponding to arrays and rows to individual probes on the array. To obtain the matrix of intensities for all four arrays <>= e<-exprs(Dilution) nrow(Dilution)*ncol(Dilution) dim(e) @

You can access probe-level PM and MM intensities using <>= PM<-pm(Dilution) dim(PM) PM[1:5,] @

To get the probe set names (Affy IDs) <>= gnames<-geneNames(Dilution) length(gnames) gnames[1:5] nrow(e)/length(gnames) @

As with other microarray objects in Bioconductor packages, you can use subsetting commands for {\tt AffyBatch} objects <>= dil1<-Dilution[1] class(dil1) dil1 cel1<-Dilution[[1]] class(cel1) cel1 @

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reading in data

One of the main functions for reading in Affymetrix data is \verb+ReadAffy+. It reads in data from \verb+CEL+ and \verb+CDF+ files and creates objects of class \verb+AffyBatch+. Using \verb+ReadAffy(widget=TRUE)+ provides widgets for interactive data input.\\

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Diagnostic plots

To produce a spatial image of probe log intensities and probe raw intensities <>= # Log transformation image(Dilution[1]) @

<>= # No transformation image(cel1) @

To produce boxplots of probe log intensities <>= boxplot(Dilution,col=c(2,2,3,3)) @

To produce density plots of probe log intensities <>= hist(Dilution, type="l", col=c(2,2,3,3), lty=rep(1:2,2), lwd=3) @

The boxplots and density plots show that the Dilution data needs normalization. As described in the dataset help file and in the \texttt{phenoData} slot (\texttt{pData(Dilution)}), two concentrations of mRNA were used and, for each concentration, two scanners were used. From the plots, we note that scanner effects seem stronger than concentration effects (different colors). Arrays that should be the same are different; arrays that should be different are similar. Because different mRNA concentrations were used, we perform probe-level normalization within concentration groups. <>= Dil20 <- normalize(Dilution[1:2]) ##conc. group 20 micrograms Dil10 <- normalize(Dilution[3:4]) ##conc. group 10 micrograms normDil <- merge(Dil20,Dil10) @

Notice how the boxplot now looks better.

<>= boxplot(normDil,col=c(2,2,3,3)) @

The \verb+affy+ package provides implementations for a number of methods for background correction, probe-level normalization (e.g., quantile, curve-fitting (Bolstad et al., 2002)), and computation of expression measures (e.g., MAS 4.0, MAS 5.0, MBEI (Li \& Wong, 2001), RMA (Irizarry et al., 2003)). To list available methods for \verb+AffyBatch+ objects

<>= bgcorrect.methods normalize.AffyBatch.methods pmcorrect.methods express.summary.stat.methods @

The main normalization function is \verb+expresso+. You can select pre-processing methods interactively using widgets by typing {\tt expresso(Dilution, widget=TRUE)}. The function operates on objects of class \verb+AffyBatch+ and returns objects of class \verb+exprSet+. \verb+rma+ provides a more efficient implementation of Robust Multi-array Average (RMA). We don't normalize because we already did above. <>= rmaDil<-rma(normDil,normalize=FALSE) class(rmaDil) @

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % CDF data packages

Data packages for CDF information can be download from \url{}. These packages contain environment objects which provide mappings between AffyIDs and matrices of probe locations, with rows corresponding to probe-pairs and columns to PM and MM cels. CDF environments for HGU95Av2 and HGU133A chips are already in the package. For information on the environment object {\tt ? hgu95av2cdf} <>= annotation(Dilution) data(hgu95av2cdf) pnames<-ls(env=hgu95av2cdf) length(gnames) gnames[1:5] get(gnames[1],env=hgu95av2cdf) @

You can also use the \verb+indexProbes+, \verb+pmindex+, and \verb+mmindex+ functions to get information on probe location <>= plocs<-indexProbes(Dilution,which="both") plocs[[1]] pmindex(Dilution,genenames=gnames[1], xy=TRUE) pmindex(Dilution,genenames=gnames[1]) @

Having access to PM and MM data can be useful. Let's look at a plot of PM vs. MM

<>= plot(mm(Dilution[1]),pm(Dilution[1]),pch=".",log="xy") abline(0,1,col="red") @



BioC 2.5, consisting of 352 packages and designed to work with R 2.10.z, was released today.


R, the open source platform used by Bioconductor, featured in a series of articles in the New York Times.