Skip to content.

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.



% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % % \VignetteIndexEntry{Lab 1} %\VignetteDepends{Biobase} %\VignetteKeywords{Microarray} \documentclass[12pt]{article}

\usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref}

\newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}}

\textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle}

\title{Lab 1: Bioconductor Basics} \bibliographystyle{plainnat}



In this laboratory we will introduce some of the basic interactions with Bioconductor.


library(Biobase) library(annotate) library(golubEsets)


The package \texttt{golubEsets} contains three data sets that were obtained from the web and slightly massaged. They represent the data analysed in \citet{Golub99} to perform class prediction using microarray data. The data were collected on Affymetrix Hu 6800 chip and which contains probes for 7129 genes.

An \texttt{exprSet} basically consists of the gene expression matrix (optionally a set of standard errors for those estimates), the related experimental metadata (who did what when and to what), and the phenotypic data. Here phenotype is interpreted quite broadly -- it represents any physical characteristics of the sample.







Notice that when subsetting we have arranged it so that the \textit{rows} correspond to genes and the \textit{columns} correspond to samples.

The phenotypic data are stored in a separate, but linked, data frame. You can obtain it and interact with it using specific methods.


pD <- phenoData(golubTrain)


pd <- pData(pD)



An object of class \texttt{phenoData} is a combination of a dataframe containing the various data elements and a list that explains what each variable represents. This information is usually relegated to a help page but we felt that it was important to keep it more closely associated with the data.

The \verb+$+ operator performs the job of extracting particular variables from an object of class \texttt{phenoData}. It also can be used directly on the \texttt{exprSet}.



##different data data(golubTest) table(golubTest$ALL.AML)

@ %$ The S4 methods package has introduced substantial new capabilities into R. To obtain the manual pages for S4 classes you should use the following syntax \texttt{class?exprSet}. Please do that now and we will look at help page.

Almost all R functions have a set of runnable examples that are shown at the bottom of the manual page. You can either scroll down to them and cut-and-paste them across or use the R function \Rfunction{example} to run them. Try \texttt{example(exprSet)}.

To see what packages are currently loaded into your R session you can use \Rfunction{search}. You can list the functions in any package that is attached by using \texttt{objects("package:ts")}, for example. This will list all the objects in the time series package \Rpackage{ts}. Another useful command is \Rfunction{find} which will tell you which package contains the definition of a function.



BioC 2.5, consisting of 352 packages and designed to work with R 2.10.z, was released today.


R, the open source platform used by Bioconductor, featured in a series of articles in the New York Times.