In this vignette, we describe usage of a suite of tools, SEESAW, Statistical Estimation of allelic Expression using Salmon and Swish.

Running SEESAW involves generation of a diploid transcriptome (e.g. using g2gtools, construction of a diploid Salmon index (specifying --keepDuplicates), followed by Salmon quantification with a number of bootstrap inferential replicates (we recommend 30 bootstrap replicates). These three steps (diploid reference preparation, indexing, quantification with bootstraps) provide the input data for the following statistical analyses in R/Bioconductor. The steps shown in this vignette leverage Bioconductor infrastructure including SummarizedExperiment for storage of input data and results, tximport for data import, and GRanges and Gviz for plotting.

In short the SEESAW steps are as listed, and diagrammed below:

  1. g2gtools (diploid reference preparation)
  2. Salmon indexing with --keepDuplicates
  3. Salmon quantification with bootstraps
  4. makeTx2Tss() aggregates data to TSS-level (optional)
  5. importAllelicCounts() creates a SummarizedExperiment
  6. Swish analysis: labelKeep() and swish() (skip scaling)
  7. Plotting