Title: Efficient string manipulation and genome-wide motif searching
with Biostrings and the BSgenome data packages

Author: Herve Pages <hpages@fhcrc.org>

The Biostrings package provides the infrastructure for representing
and manipulating large nucleotide sequences (up to hundreds of millions
of letters) in Bioconductor as well as fast pattern matching functions
for finding all the occurrences of millions of short motifs in these
large sequences.
The Bioconductor project also provides a collection of "BSgenome data
packages". These packages contain the full genomic sequence for a number
of commonly studied organisms.
The Biostrings package together with the BSgenome data packages provide
an efficient and convenient framework for genome-wide sequence analysis.
This lab session is a general introduction to this framework with some
emphasis on the latest developments: the built-in masks in the BSgenome
data packages; the ability to inject SNPs from a SNPlocs package into
the chromosome sequences of a given species (only Human supported for
now); and the matchPDict() function for efficiently finding all the
occurrences in a genome of a big dictionary of short motifs (like one
typically gets from an ultra-high throughput sequencing experiment).