Package Guidelines

Introduction

The Bioconductor project promotes high-quality, well documented, and interoperable software. These guidelines help to achieve this objective; they are not meant to put undue burden on package authors, and authors having difficultly satisfying guidelines should seek advice on the bioc-devel mailing list.

Package maintainers are urged to follow these guidelines as closely as possible when developing Bioconductor packages.

General instructions for producing packages can be found in the Writing R Extensions manual, available from within R (RShowDoc("R-exts")) or on the R web site.

[ Back to top ]

Types of Packages

Most packages contributed by users are software packages that perform analytic calculations. Users also contribute annotation and experiment data packages. Annotation packages are database-like packages that provide information linking identifiers (e.g., Entrez gene names or Affymetrix probe ids) to other information (e.g., chromosomal location, Gene Ontology category). Experiment data packages provide data sets that are used, often by software packages, to illustrate particular analyses. An excellent practice is to develop a software package, and to provide or use an existing experiment data package to give a comprehensive illustration of the methods in the software package. The guidelines below apply to all packages, but annotation and experiment data packages are not required to conform to the space limitations of software packages. Developers wishing to contribute annotation or experiment data packages should seek additional support associated with package submission.

[ Back to top ]

Version of Bioconductor and R

Package developers should always use the devel version of Bioconductor when developing and testing packages to be contributed.

Depending on the R release cycle, using Bioconductor devel may or may not involve also using the devel version of R. See the how-to on using devel version of Bioconductor for up-to-date information.

[ Back to top ]

Correctness, Space and Time

Bioconductor packages must pass R CMD build (or R CMD INSTALL --build) and pass R CMD check with no errors and no warnings using a recent R-devel. Authors should also try to address all notes that arise during build or check.

Do not use filenames that differ only in case, as not all file systems are case sensitive.

The source package resulting from running R CMD build should occupy less than 4MB on disk. The package should require less than 5 minutes to run R CMD check --no-build-vignettes. Using the --no-build-vignettes option ensures that the vignette is built only once.

Vignette and man page examples should not use more than 3GB of memory since R cannot allocate more than this on 32-bit Windows.

[ Back to top ]

Package Name

Choose a descriptive name. An easy way to check whether your name is already in use is to check that the following command fails

source("http://bioconductor.org/biocLite.R")
biocLite("MyPackage")

Avoid names that are easily confused with existing package names, or that imply a temporal (e.g., ExistingPackage2) or qualitative (e.g., ExistingPackagePlus) relationship.

[ Back to top ]

License

The "License:" field in the DESCRIPTION file should preferably refer to a standard license (see opensource.org or wikipedia) using one of R's standard specifications. Be specific about any version that applies (e.g., GPL-2). Core Bioconductor packages are typically licensed under Artistic-2.0. To specify a non-standard license, include a file named LICENSE in your package (containing the full terms of your license) and use the string "file LICENSE" (without the double quotes) in the "License:" field of your DESCRIPTION file.

[ Back to top ]

Package Content

Packages must

[ Back to top ]

Package Dependencies

Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources.

Reuse, rather than re-implement or duplicate, well-tested functionality from other packages. Specify package dependencies in the DESCRIPTION file, listed as follows

A package may rarely offer optional functionality, e.g., visualization with rgl when that package is available. Authors then list the package in the Suggests field, and use requireNamespace() (or loadNamespace()) to condition code execution. Functions from the loaded namespace should be accessed using :: notation, e.g.,

x <- sort(rnorm(1000))
y <- rnorm(1000)
z <- rnorm(1000) + atan2(x,y)
if (requireNamespace("rgl", quietly=TRUE)) {
    rgl::plot3d(x, y, z, col=rainbow(1000))
} else {
    ## code when "rgl" is not available
}

This approach does not alter the user search() path, and ensures that the necessary function (plot3d(), from the rgl package) is used. Such conditional code increases complexity of the package and frustrates users who do not understand why behavior differs between installations, so is often best avoided.

[ Back to top ]

S4 Classes and Methods

Re-use existing S4 classes and generics where possible. This encourages interoperability and simplifies your own package development. If your data requires a new representation or function, carefully design an S4 class or generic so that other package developers with similar needs will be able to re-use your hard work, and so that users of related packages will be able to seamlessly use your data structures. Do not hesitate to ask on the Bioc-devel mailing list for advice.

Implement a constructor (typically a simple function) if the user is supposed to be able to create an instance of your class. Write short accessors (functions or methods) if the user needs to extract from or assign to slots in the class. Constructors and accessors help separate the interface seen by the user from the implementation details relevant to the developer.

The following layout is sometimes used to organize classes and methods; other approaches are possible and acceptable.

A Collates: field in the DESCRIPTION file may be necessary to order class and method definitions appropriately during package installation.

[ Back to top ]

Vectorized Calculations

Many R operations are performed on the whole object, not just the elements of the object (e.g., sum(x), not x[1] + x[2] + ...). In particular, relatively few situations require an explicit for loop.

[ Back to top ]

End-User Messages

[ Back to top ]

Graphics Device

Use dev.new() to start a graphics device if necessary. Avoid using x11() or X11() for it can only be called on machines that have access to an X server.

[ Back to top ]

Vignette(s)

A vignette demonstrates how to accomplish non-trivial tasks embodying the core functionality of your package. There are two common types of vignettes. A Sweave vignette is an .Rnw file that contains LaTeX and chunks of R code. The R code chunk starts with a line <<>>=, and ends with @. Each chunk is evaluated during R CMD build, prior to LaTeX compilation to a PDF document. An R markdown vignette is similar to a Sweave vignette, but uses markdown instead of LaTeX for structuring text sections and resulting in HTML output. The knitr package can process most Sweave and all R markdown vignettes, producing pleasing output. Refer to Writing package vignettes for technical details. See the BiocStyle package for a convenient way to use common macros and a standard style.

A vignette provides reproducibility: the vignette produces the same results as copying the corresponding commands into an R session. It is therefore essential that the vignette embed R code between <<>>= and @; short-cuts (e.g., using a LaTeX verbatim environment, or using the Sweave eval=FALSE flag, or equivalent tricks in markdown) undermine the benefit of vignettes.

All packages are expected to have at least one vignette. Vignettes go in the vignettes directory of the package. Vignettes are often used as stand-alone documents, so best practices are to include an informative title, the primary author of the vignette, the last modified date of the vignette, and a link to the package landing page.

[ Back to top ]

Citations

Appropriate citations must be included in help pages (e.g., in the see also section) and vignettes; this aspect of documentation is no different from any scientific endeavor. The file inst/CITATION can be used to specify how a package is to be cited.

Whether or not a CITATION file is present, an automatically-generated citation will appear on the package landing page on the Bioconductor web site. For optimal formatting of author names (if a CITATION file is not present), specify the package author and maintainer using the Authors@R field as described in Writing R Extensions.

[ Back to top ]

Version Numbering

All Bioconductor packages use an x.y.z version scheme. The following rules apply:

When first submitted to Bioconductor, a package usually has version 0.99.0. For more details, see Version Numbering

[ Back to top ]

C or Fortran code

If the package contains C or Fortran code, it should adhere to the standards and methods described in the System and foreign language interfaces section of the Writing R Extensions manual. In particular:

Third-party code

Use of external libraries whose functionality is redundant with libraries already supported is strongly discouraged. In cases where the external library is complex the author may need to supply pre-built binary versions for some platforms.

By including third-party code a package maintainer assumes responsibility for maintenance of that code. Part of the maintenance responsibility includes keeping the code up to date as bug fixes and updates are released for the mainline third-party project.

For guidance on including code from some specific third-party sources, see the external code sources section of the C++ Best Practices guide.

[ Back to top ]

Unit Tests

Unit tests are highly recommended. We find them indispensable for both package development and maintenance. Examples and explanations are provided here.

[ Back to top ]

Videos

You can submit an instructional video along with your package. In the DESCRIPTION file of your package, add a "Video:" line which contains the link to your video. We will then feature your video on our Bioconductor YouTube Channel.

[ Back to top ]

Duplication of Packages in CRAN and Bioconductor

Authors are strongly discouraged from placing their package into both CRAN and Bioconductor. This avoids burdening the author with extra work and confusing the user.

[ Back to top ]

Package Author and Maintainer Responsibilities

Acceptance of packages into Bioconductor brings with it ongoing responsibility for package maintenance. These responsibilities include:

All authors mentioned in the package DESCRIPTION file are entitled to modify package source code. Changes to package authorship require consent of all authors.

[ Back to top ]

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version »

Bioconductor packages under development:


Developer Resources:

Fred Hutchinson Cancer Research Center