Skip to content.

bioconductor.org

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.

Sections

What is BioConductor?

Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.

The project was started in the Fall of 2001. The Bioconductor core team is based primarily at the Fred Hutchinson Cancer Research Center. Other members come from various US and international institutions.

Bioconductor is primarily based on the R programming language but we do accept contributions in any programming language. There are two releases of Bioconductor every year (they appear shortly after the corresponding R release). At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are a large number of meta-data packages available. They are mainly, but not solely oriented towards different types of microarrays.

You can read the annual reports for further project details.

Bioconductor Packages.

Although initial efforts focused primarily on DNA microarray data analysis, many of the software tools are general and can be used broadly for the analysis of genomic data, such as SAGE, sequence, or SNP data.

Goals of the Bioconductor Project.

The broad goals of the projects are to

  • provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data;
  • facilitate the integration of biological metadata in the analysis of experimental data: e.g. literature data from PubMed, annotation data from LocusLink;
  • allow the rapid development of extensible, scalable, and interoperable software;
  • promote high-quality documentation and reproducible research;
  • provide training in computational and statistical methods for the analysis of genomic data.

Main Features of the Bioconductor Project

  • Use of R. R and the R package system are the main vehicles for designing and releasing software. R (www.r-project.org) is a widely used open source language and environment for statistical computing and graphics - GNU's S-Plus. It provides a high-level programming environment together with a sophisticated packaging and testing paradigm. It has a number of mechanisms that allow it to interact directly with software that has been written in many different languages (see Omega Project). These tools allow users to incorporate modules based on other work. Viewed in that context, adopting R as a vehicle does not exclude other development environments and paradigms. R can, in those cases, provide a glue or connectivity linking what might otherwise be different products. Finally, R is under very active development by a dedicated team of researchers with a strong commitment to good documentation and software design.
  • Documentation and reproducible research. One of the goals of the project is to provide high-quality documentation and encourage reproducible research.

    Each package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality and that can be used interactively. Packages vignettes come in several forms. Many are simple "HowTo"s, that is, they are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package, or might even discuss general issues related to the package. In the future, we are looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort.

    The vignettes are generated using the Sweave function from the R package tools. They are documents that intermix text, code, and output (textual and graphical) and can be regenerated automatically whenever the data or analyses change. Additional supporting software for vignettes will aid users in obtaining data and sample code, step through specific analyses, and apply these analyses to their own data (reposTools package).

  • Statistical and graphical methods. The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing Affymetrix and cDNA array data; identifying differentially expressed genes; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art statistical and graphical techniques, including linear and non-linear modeling, cluster analysis, prediction, resampling, survival analysis, and time-series analysis.
  • Annotation. The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as GenBank, LocusLink and PubMed (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources.
    Software tools are available for assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, LocusLink, UniGene, the UCSC Human Genome Project (AnnBuilder package).
    Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). Customized annotation libraries can also be assembled.
  • Bioconductor short courses. The Bioconductor projects has developed a program of short courses on software and statistical methods for the analysis of genomic data. Courses have been given for audiences with backgrounds in either biology or statistics. All course materials (lectures and computer labs) are available on the WWW. Customized short courses may also be designed for interested parties.
  • Open source. Bioconductor has a commitment to full open source discipline, with distribution via a SourceForge-like platform. All contributions are expected to exist under an open source license such as GPL2 or BSD. There are many different reasons why open--source software is beneficial to the analysis of microarray data and to computational biology in general. The reasons include:
    • full access to algorithms and their implementation
    • the ability to fix bugs and extend and improve the supplied software
    • to encourage good scientific computing and statistical practice by providing appropriate tools and instruction
    • to provide a workbench of tools that allow researchers to explore and expand the methods used to analyze biological data
    • to ensure that the international scientific community is the owner of the software tools needed to carry out research
    • to lead and encourage commercial support and development of those tools that are successful
    • to promote reproducible research by providing open and accessible tools with which to carry out that research [reproducible research is distinct from independent verification]

News
2008-05-01

BioC 2.2, consisting of 260 packages and designed to work with R 2.7.0, was released today.

2008-03-04

BioConductor release scheduled for 30 April 2008.