Creating Workflow Package

The main focus of a workflow package is the vignette!

What is a workflow vignette?

Workflow vignettes are documents which describe a bioinformatics workflow that involves multiple Bioconductor packages. These workflows are usually more extensive than the vignettes that accompany individual Bioconductor packages.

Existing Workflows

Workflow vignettes may deal with larger data sets and/or be more computationally intensive than typical Bioconductor package vignettes. For this reason, the automated builder that produces these vignettes does not have a time limit (in contrast to the Bioconductor package building system which will time out if package building takes too long). It is expected the majority of vignette code chunks are evaluated.

Who should write a workflow vignette?

Anyone who is a bioinformatics domain expert.

How do I write and submit a workflow vignette?

Consistent formatting

The following is taken as an example header from the variants workflow package:

    
– – –
title: Annotating Genomic Variants
author: 
–name: Valerie Obenchain
  affiliation: Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., P.O. Box 19024, Seattle, WA, USA 98109–1024
date: 11 April 2018
vignette: >
  %\VignetteIndexEntry{Annotating Genomic Variants}
  %\VignetteEngine{knitr::rmarkdown}
output: 
    BiocStyle::html_document
– – –


# Version Info
```{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({library('variants')})
```
<p>
**R version**: `r R.version.string`
<br />
**Bioconductor version**: `r BiocInstaller::biocVersion()`
<br />
**Package version**: `r packageVersion("variants")`
</p>


    

Tidying package loading output

Most workflows load a number of packages and you do not want the output of loading those packages to clutter your workflow document. Here’s how you would solve this in markdown; you can do something similar in Latex.

First, set up a code chunk that is evaluated but not echoed, and whose results are hidden. We also set warning=FALSE to be sure that no output from this chunk ends up in the document:

```{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
library(GenomicRanges)
library(GenomicAlignments)
library(Biostrings)
library(Rsamtools)
library(ShortRead)
library(BiocParallel)
library(rtracklayer)
library(VariantAnnotation)
library(AnnotationHub)
library(BSgenome.Hsapiens.UCSC.hg19)
library(RNAseqData.HNRNPC.bam.chr14)
})
```

Then you can set up another code chunk that is echoed, which has almost the same contents. The second invocation of library() will not produce any output since the package has already been loaded:

```{r}
library(GenomicRanges)
library(GenomicAlignments)
library(Biostrings)
library(Rsamtools)
library(ShortRead)
library(BiocParallel)
library(rtracklayer)
library(VariantAnnotation)
library(AnnotationHub)
library(BSgenome.Hsapiens.UCSC.hg19)
library(RNAseqData.HNRNPC.bam.chr14)
```

Citations

To manage citations in your workflow document, specify the bibliography file in the document metadata header.

bibliography: references.bib

You can then use citation keys in the form of [@label] to cite an entry with an identifier “label”.

Normally, you will want to end your document with a section header “References” or similar, after which the bibliography will be appended.

For more details see the rmarkdown documentation.

Questions

If you have any questions, please ask on the bioc-devel mailing list.