Welcome to the new bioconductor.org!

:section_navigation => has_subnav?(@item) && subnav_items(@item).length > 0) %>

Computational Statistics for Genome Biology (CSAMA)

Brixen-Bressanone, Italy

2014-06-22 ~ 2013-06-28


  • Martin Morgan, Fred Hutchinson Cancer Research Center (USA)
  • Robert Gentleman, Genentech (USA)
  • Vincent J. Carey, Channing Laboratory, Harvard Medical School (USA)
  • Wolfgang Huber, European Molecular Biology Laboratory (DE)
  • Simon Anders, European Molecular Biology Laboratory (DE)
  • Laurent Gatto, University of Cambridge (UK)
  • Michael Lawrence, Genetech (USA)


This one-week intensive course teaches current approaches in the statistical and computational analysis of large-scale experiments in biology. The course focuses on the methods for downstream analyses of high-throughput sequencing experiments including RNA sequencing (differential expression), DNA sequencing (variant calling), ChIP-Seq. Lectures also cover essentials including statistical testing, linear models, machine learning, visualisation and bioinformatic annotation. Emphasis is given to practical problem solving skills using open-source software from the Bioconductor, CRAN and other projects. The course is intended for researchers who have basic familiarity with the experimental technologies and the biology of the genome, and who are interested in developing their own, advanced data analyses using a scripting environment. The four practical sessions of the course will require simple script understanding in the computer language R. A tutorial on the required more advanced features of R will be provided, students are advised to familiarize themselves with the very basics of R beforehand. (Consider one of the many online resources or books, e.g. R-Intro from the R Project, Germán Rodríguez, R-Studio.


Monday, June 22

Morning talks

  • pdf Introduction to R and Bioconductor
  • pdf Basics of high-throughput sequencing technologies and short read aligners
  • pdf Elements of statistics 1: t-test and linear model
  • pdf Elements of statistics 2: multiple testing, false discovery rates, independent filtering

Afternoon labs

  • zip R introduction/refresher: data types, reading and writing files and spreadsheets, plotting, programming, functions and packages.
  • pdf R Exploratory data analysis and visualization (pdf solutions)
  • html R Intermediate R 1: accessing resources - packages, classes, methods, and efficient code. Download IntermediateR1_1.0.0.tar.gz and install as:

    biocLite(c("IRanges", "GenomicRanges", "microbenchmark"))
    install.packages("IntermediateR1_1.0.0.tar.gz", repos=NULL, type="source")
  • pdf R Intermediate R 2: scalable / performant computing. (Large files needed for some of this lab are NOT available for download). Download CSAMA2014ScalableComputingLab_0.0.1.tar.gz and install as:

    biocLite(c("IRanges", "GenomicRanges", "Rsamtools", "ShortRead", 
        "rtracklayer", "GenomicAlignments", "GEOquery", "microbenchmark",
        "BiocParallel", "ggbio", "Biobase", "GenomicFiles"))
        repos=NULL, type="source")


Morning talks

  • pdf RNA-Seq 1: differential expression analysis - GLMs and testing
  • RNA-Seq 2: shrinkage, empirical Bayes, FC estimation
  • pdf Visualisation
  • pdf Computing with genomic ranges, sequences and alignments

Afternoon labs


Morning talks

  • pdf DNA-Seq 1: Variant calling
  • pdf DNA-Seq 2: visualisation and quality assessment of variant calls
  • pdf Gene set enrichment analysis
  • pdf R Working with gene and genome annotations

Afternoon labs


Morning talks

  • pdf RNA-Seq 3: alternative exon usage
  • html Elements of statistics 3: Classification and clustering - basic concepts
  • pdf Elements of statistics 4: regularisation & kernels
  • pdf R ChIP-Seq

Afternoon labs

  • pdf R Working with the Ranges infrastructure: annotating and understanding regions


Morning talks

  • pdf Elements of statistics 5: experimental design
  • pdf eQTL / molecular-QTL analyses
  • pdf Proteomics
  • Emerging topic – pdf image analysis

Afternoon labs