Course Materials for 2013

Bioinformatics and Statistics for Large-Scale Data

November, Shenzhen, China

This international advanced course will provide training on bioinformatics and statistics methods for genomic research. It will give insight into how biological knowledge can be generated from high-throughput sequencing (DNA-Seq, RNA-seq, ChIP-seq) experiments and will illustrate how to analyze such data. The course covers both the underlying statistical and algorithmic concepts, and the practice of how to automate and code such analyses using the scripting language R. The course will be a mix of lectures and hands-on training. Practicals will consist of computer exercises that will enable the participants to apply statistical methods to the analysis of data under the guidance of the lecturers and teaching assistants. The EMBO Practical course will also teach the basics of the R/Bioconductor environment for statistical-bioinformatic data analysis. The course is aimed at PhD students, postdocs and interested faculty. The teaching language will be English. Basic experience in computer programming (writing scripts) is required.

Introduction to Statistical Computing with R and Bioconductor

October, Akron, OH

This hands-on workshop introduces the use of R and Bioconductor for the analysis and comprehension of high-throughput genomic data.

Gaining Deeper Understanding of R / Bioconductor

September, Seattle, USA

This intermediate course is directed at R / Bioconductor users who, in an effort to get the most out of high-throughput sequence and other analyses, want to understand more about how R and Bioconductor work. (1) The course begins by reviewing R data types, memory management, and other aspects of internal computation. We use this as a basis for understanding how to writing, debug, and assess the performance of efficient R code, including straight-forward approaches to iteration, vectorization, and parallel evaluation. (2) We then explore R objects, especially the S4 object system. We learn about how to specify simple and more complicated S4 objects, and how to implement essential methods for single and multiple dispatch. We use insights from performance and the S4 class system to explore strategies for efficient representation of large structured data, especially the classes in the IRanges, GenomicRanges, VariantAnnotation, and Biostrings packages. (3) Availability of programming libraries (such as samtools) or performance needs may sometimes point to use of C or C++ code integrated into R. We develop some simple C functions, and explore use of Rcpp as a relatively painless way to incorporate C code. We take a brief look at R's internal data representations, and explore how to debug and profile C code. (4) Finally, we investigate how R can be used to interact with other important resources: data bases; web sites; and visualization facilities like shiny. Use of some of these facilities is illustrated by packages such as AnnotationDbi and biomaRt.

BioC2013

July, Seattle, USA

Developer Day is July 17, 2013. Conference is July 18-19, 2013. This conference highlights current developments within and beyond Bioconductor, an international open source and open development software project for the analysis and comprehension of high-throughput genomic data.

useR! 2013 R / Bioconductor for Analysis and Comprehension of High-Throughput Genomic Data

July, Albacete, Spain

DNA sequence analysis generates large volumes of data presenting challenging bioinformatic and statistical problems. This tutorial introduces Bioconductor packages and work flows for the analysis of sequence data. We learn about approaches for efficiently manipulating sequences and alignments, and introduce common work flows and the unique statistical challenges associated with RNA-seq, variant annotation, and other experiments. The emphasis is on exploratory analysis, and the analysis of designed experiments.The workshop emphasizes orientation within the Bioconductor milieu; we will touch on the Biostrings, ShortRead, GenomicRanges, edgeR, and VariantAnnotation, and other packages, with short exercises to illustrate the functionality of each package.

Computational Statistics for Genome Biology (CSAMA)

June, Brixen-Bressanone, Italy

This one-week intensive course teaches current approaches in the statistical and computational analysis of large-scale experiments in biology. The course focus on the methods for downstream analysis of high-throughput sequencing experiments, including DNA sequencing (variant calling), RNA sequencing (differential expression), QTL analysis, epigenetics. Lectures also cover essentials including statistical testing, machine learning, visualisation and bioinformatic metadata integration. The course is intended for researchers who have a basic familiarity with the experimental technologies and the biology of the genome. The four practical sessions of the course will require simple programming in the language R; introductory and advanced language tutorials will be provided.

Genentech R / Bioconductor for High Throughput Sequence Analysis

May, South San Francisco, USA

Genentech course on high-throughput sequence analysis.

Intermediate R / Bioconductor for High-Throughput Sequence Analysis

May, Seattle, USA

Intermediate R / Bioconductor for High-Throughput Sequence Analysis introduces users with some R experience to common Bioconductor work flows for sequence analysis. The course involves a combination of presentations and hands-on exercises. Our starting point is BAM files created by aligning short reads to a reference genome. Topics include exploratory analysis (GenomicRanges, Rsamtools); assessing differential expression of known genes (DESeq); detection, calling, and manipulation of variants (VariantTools, VariantAnnotation). We learn how to integrate results with curated gene and genomic annotations (GenomicFeatures), and to visualize results (GViz, ggbio).

Intermediate R / Bioconductor for High-Throughput Sequence Analysis

February, Seattle, USA