1 Introduction

The microbiome R package facilitates exploration and analysis of microbiome profiling data, in particular 16S taxonomic profiling.

This vignette provides a brief overview with example data sets from published microbiome profiling studies (Lahti et al. 2014, Lahti et al. (2013), O’Keefe et al. (2015)). A more comprehensive tutorial is available on-line.

Tools are provided for the manipulation, statistical analysis, and visualization of taxonomic profiling data. In addition to targeted case-control studies, the package facilitates scalable exploration of large population cohorts (Lahti et al. 2014). Whereas sample collections are rapidly accumulating for the human body and other environments, few general-purpose tools for targeted microbiome analysis are available in R. This package supports the independent phyloseq data format and expands the available toolkit in order to facilitate the standardization of the analyses and the development of best practices. See also the related PathoStat pipeline mare pipeline, phylofactor, and structSSI for additional 16S rRNA amplicon analysis tools in R. The aim is to complement the other available packages, but in some cases alternative solutions have been necessary in order to streamline the tools and to improve complementarity.

We welcome feedback, bug reports, and suggestions for new features from the user community via the issue tracker and pull requests. See the Github site for source code and other details. These R tools have been utilized in recent publications and in introductory courses (Salonen et al. 2014, Faust et al. (2015), Shetty et al. (2017)), and they are released under the Two-clause FreeBSD license.

Kindly cite the work as follows: “Leo Lahti et al. (Bioconductor, 2017). Tools for microbiome analysis in R. Microbiome package version . URL: (

2 Installation

To install microbiome package in R (Bioconductor development version), use


Then load the package in R


3 Data

The microbiome package relies on the independent phyloseq data format. This contains an OTU table (taxa abundances), sample metadata (age, BMI, sex, …), taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and a phylogenetic tree (relations between the taxa).

3.1 Example data sets

Example data sets are provided to facilitate reproducible examples and further methods development.

The HITChip Atlas data set Lahti et al. Nat. Comm. 5:4344, 2014 contains 130 genus-level taxonomic groups across 1006 western adults. Load the example data in R with

# Data from 
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1172 samples ]
## sample_data() Sample Data:       [ 1172 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 2 taxonomic ranks ]

The two-week diet swap study between western (USA) and traditional (rural Africa) diets, reported in O’Keefe et al. Nat. Comm. 6:6342, 2015

data(dietswap) # Data from
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
## sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 2 taxonomic ranks ]

A parallel profiling of gut microbiota versus blood metabolites from Lahti et al. PeerJ 1:e32, 2013 to characterize associations between human intestinal microbiota and blood serum lipids

data(peerj32) # Data from

3.2 Data import

You can import 16S profiling data from standard formats (Mother, BIOM, CSV, etc.). See the tutorial for details.

3.3 Data manipulation

A phyloseq object can be subsetted, filtered, aggregated, transformed, and otherwise manipulated. For a comprehensive list of tools, see the online tutorial.

To convert absolute counts to compositional (relative) abundances, for instance, use

# dietswap is a phyloseq object; see above
dietswap.compositional <- transform(dietswap, "compositional")

4 Ecosystem indices

4.1 Alpha diversity, richness, evenness, dominance, and rarity

Commonly used ecosystem state variables include various indices to quantify alpha diversities, richness, evenness, dominance, and rarity (see functions with similar names). We provide a comprehensive set of such indices via a standardized interface.

The function global calls these indicators with default parameters. For further options, see tutorial.

g <- global(atlas1006, index = "gini")

Visually-Weighted Regression curve with smoothed error bars is based on the can be used to visualize sample variables (1), here the relation between age and diversity. This function operates on standard data frames.

# Estimate Shannon diversity and add it to the phyloseq object
sample_data(atlas1006)$diversity <- global(atlas1006, index = "shannon")[,1]

# Compare age and microbiome diversity
plot_regression(diversity ~ age, meta(atlas1006))