Introduction to Bioconductor

Martin Morgan, Hervé Pagès
February 4, 2015

Background: R

S3 classes

x <- rnorm(1000)
y <- x + rnorm(1000, .5)
df <- data.frame(x=x, y=y)
fit <- lm(y ~ x, df)
class(fit)
## [1] "lm"
methods(class=class(fit))
##  [1] add1.lm*           alias.lm*          anova.lm*         
##  [4] case.names.lm*     confint.lm         cooks.distance.lm*
##  [7] deviance.lm*       dfbeta.lm*         dfbetas.lm*       
## [10] drop1.lm*          dummy.coef.lm      effects.lm*       
## [13] extractAIC.lm*     family.lm*         formula.lm*       
## [16] hatvalues.lm*      influence.lm*      kappa.lm          
## [19] labels.lm*         logLik.lm*         model.frame.lm*   
## [22] model.matrix.lm    nobs.lm*           plot.lm*          
## [25] predict.lm         print.lm*          proj.lm*          
## [28] qr.lm*             residuals.lm       rstandard.lm*     
## [31] rstudent.lm*       simulate.lm*       summary.lm        
## [34] variable.names.lm* vcov.lm*          
## 
##    Non-visible functions are asterisked
methods(anova)
## [1] anova.glm*     anova.glmlist* anova.lm*      anova.lmlist* 
## [5] anova.loess*   anova.mlm*     anova.nls*    
## 
##    Non-visible functions are asterisked
plot(y ~ x, df)
abline(fit, col="red", lwd=2)

plot of chunk S3

S4 classes

suppressPackageStartupMessages({
    library(IRanges)
})
start <- as.integer(runif(1000, 1, 1e4))
width <- as.integer(runif(length(start), 50, 100))
ir <- IRanges(start, width=width)
coverage(ir)
## integer-Rle of length 10092 with 1743 runs
##   Lengths:  7  8  6  9 10  4  2  1 18 13 ...  2  3 11  2 16 12 11  9 16 17
##   Values :  0  1  2  3  4  5  6  7  8  7 ...  8  9  8  7  6  5  4  3  2  1
findOverlaps(ir)
## Hits object with 15638 hits and 0 metadata columns:
##           queryHits subjectHits
##           <integer>   <integer>
##       [1]         1         693
##       [2]         1         594
##       [3]         1         814
##       [4]         1         229
##       [5]         1         178
##       ...       ...         ...
##   [15634]      1000         204
##   [15635]      1000         748
##   [15636]      1000         291
##   [15637]      1000          14
##   [15638]      1000         821
##   -------
##   queryLength: 1000
##   subjectLength: 1000
showMethods("coverage")
## Function: coverage (package IRanges)
## x="IRanges"
##     (inherited from: x="Ranges")
## x="RangedData"
## x="Ranges"
## x="RangesList"
## x="Views"
showMethods(classes=class(ir), where=search())

Notes

suppressPackageStartupMessages({
    library(GenomicRanges)
})
showMethods("coverage")
## Function: coverage (package IRanges)
## x="GRangesList"
## x="GenomicRanges"
## x="RangedData"
## x="Ranges"
## x="RangesList"
## x="SummarizedExperiment"
## x="Views"

Principles

  1. Statistical
  2. Extensive
  3. Interoperable
  4. Reproducible
  5. Accessible – affordable, transparent, usable

Infrastructure

Sequences

Genomic Ranges

Integrating sample, range and assay data

Key packages

Biostirings – Sequences

GenomicRanges – Ranges

BiocParallel – Parallel processing

Work flows

SequencingEcosystem

biocViews for discovery.

RNA-seq

ChIP-seq

Variants

Copy number

Methylation

Expression and other arrays