Title: Efficient string manipulation and genome-wide motif searching with Biostrings and the BSgenome data packages Author: Herve Pages The Biostrings package provides the infrastructure for representing and manipulating large nucleotide sequences (up to hundreds of millions of letters) in Bioconductor as well as fast pattern matching functions for finding all the occurrences of millions of short motifs in these large sequences. The Bioconductor project also provides a collection of "BSgenome data packages". These packages contain the full genomic sequence for a number of commonly studied organisms. The Biostrings package together with the BSgenome data packages provide an efficient and convenient framework for genome-wide sequence analysis. This lab session is a general introduction to this framework with some emphasis on the latest developments: the built-in masks in the BSgenome data packages; the ability to inject SNPs from a SNPlocs package into the chromosome sequences of a given species (only Human supported for now); and the matchPDict() function for efficiently finding all the occurrences in a genome of a big dictionary of short motifs (like one typically gets from an ultra-high throughput sequencing experiment).