Fun and easy things to do with Bioconductor sequences, reads, ranges,
and alignments.

* Biostrings / BSgenome?

** Exome GC content and other simple metrics
- Start with GRanges of all known genes
- Use Views() or getSeq() to retrieve sequence
- Use alphabetFrequency() or letterFrequency() to calculate GC content

** Binding motifs in 5' regions
- Start with GRanges of 5' regions, 'position weight matrix'
  describing binding motif
- Use getSeq() to get sequences of 5' regions
- Use matchPWM to identify motif binding sites

** Consequences of variants for amino acid sequence
- Use GenomicFeatures::extractTranscripts to assemble transcript
  sequences from exon coordinates
- Use translate() to obtain amino acid sequences
- Use subseq() <- ... to sequence to reflect variants
- Use translate() to obtain variant  amino acid sequences

* GRanges?

** Intronic locations
- Start with exons-by-gene GRangesList
- Use gaps() to find intron coordinates

** Regions upstream of transcription start sites (TSS)
- Start with exons-by-transcripts GRangesList
- Use range() & start() to identify TSS
- Use flank() to identify upstream (5') regions
- Use BSgenome (or FaFile) & getSeq() to retrieve DNA sequence

** ChIP-seq nearest genes
- Start with ChIP peaks GRanges, gene GRanges
- Use nearest() (or precede() / follow()) 

** Count ChIP peaks overlapping ENCODE regulatory region(s)
- Start with ChIP GRanges 'chip', ENCODE track GRanges 'encode'
- Use countOverlaps() or findOverlaps() e.g., how many peaks overlap
  ENCODE track?

    sum(countOverlaps(chip, encode) != 0)

** Mapping 'disjoint' exons to actual exons

* GAlignments?

** Coverage
- Use coverage(BamFile("my.bam"))

** Count reads overlapping genes
- Use summarizeOverlaps(BamFileList(), ...)
  - memory efficient, parallel
- Use count/findOverlaps for ad hoc schemes

** Identify reads whose mates map to different chromosomes
- Use sortBam() to sort BAM file by qname
- Use BamFile() and readGAlignments(..., obeyQname=TRUE)
  - ? and ScanBamParam(flag=scanBamFlag(isProperPair=TRUE))
- Split seqnames by rname, and determine unique length

  readnms <- split(seqnames, rname)
  mapto <- sapply(readnms, function(x) length(unique(x)))
  readnms[mapto != 1]

- Clever way to do this operation without using split?