This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

consensusSeekeR Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region.

globalSeq Testing for association between RNA-Seq and high-dimensional data

The method may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.

globalSeq Testing for association between RNA-Seq and high-dimensional data

The method may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.

consensusSeekeR Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region.

scde Single Cell Differential Expression

The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following publication:

R4RNA An R package for RNA visualization and analysis

A package for RNA basepair analysis, including the visualization of basepairs as arc diagrams for easy comparison and annotation of sequence and structure. Arc diagrams can additionally be projected onto multiple sequence alignments to assess basepair conservation and covariation, with numerical methods for computing statistics for each.

CNPBayes Bayesian mixture models for copy number polymorphisms

Bayesian hierarchical mixture models for batch effects and copy number.

subSeq Subsampling of high-throughput sequencing count data

Subsampling of high throughput sequencing count data for use in experiment design and analysis.

biobroom Turn Bioconductor objects into tidy data frames

This package contains methods for converting standard objects constructed by bioinformatics packages, especially those in Bioconductor, and converting them to tidy data. It thus serves as a complement to the broom package, and follows the same the tidy, augment, glance division of tidying methods. Tidying data makes it easy to recombine, reshape and visualize bioinformatics analyses.

miRcomp Tools to assess and compare miRNA expression estimatation methods

Based on a large miRNA dilution study, this package provides tools to read in the raw amplification data and use these data to assess the performance of methods that estimate expression from the amplification curves.

SNPhood SNPhood: Investigate, quantify and visualise the epigenomic neighbourhood of SNPs using NGS data

To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq or RNA-Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest.

RiboProfiling Ribosome Profiling Data Analysis: from BAM to Data Representation and Interpretation

Starting with a BAM file, this package provides the necessary functions for quality assessment, read start position recalibration, the counting of reads on CDS, 3'UTR, and 5'UTR, plotting of count data: pairs, log fold-change, codon frequency and coverage assessment, principal component analysis on codon coverage.

GeneBreak Gene Break Detection

Recurrent breakpoint gene detection on copy number aberration profiles.

DChIPRep DChIPRep - Analysis of chromatin modification ChIP-Seq data with replication

The DChIPRep package implements a methodology to assess differences between chromatin modification profiles in replicated ChIP-Seq studies as described in Chabbert et. al -

GUIDEseq GUIDE-seq analysis pipeline

The package implements GUIDE-seq analysis workflow including functions for obtaining unique insertion sites (proxy of cleavage sites), estimating the locations of the insertion sites, aka, peaks, merging estimated insertion sites from plus and minus strand, and performing off target search of the extended regions around insertion sites.

SWATH2stats Transform and Filter SWATH Data for Statistical Packages

This package is intended to transform SWATH data from the OpenSWATH software into a format readable by other statistics packages while performing filtering, annotation and FDR estimation.

SISPA SISPA: Method for Sample Integrated Set Profile Analysis

Sample Integrated Gene Set Analysis (SISPA) is a method designed to define sample groups with similar gene set enrichment profiles.

SICtools Find SNV/Indel differences between two bam files with near relationship

This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison thourgh each base position across the genome region of interest. The difference is inferred by fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region.

Prostar Provides a GUI for DAPAR

This package provides a GUI interface for DAPAR.

pathVar Methods to Find Pathways with Significantly Different Variability

This package contains the functions to find the pathways that have significantly different variability than a reference gene set. It also finds the categories from this pathway that are significant where each category is a cluster of genes. The genes are separated into clusters by their level of variability.

metagenomeFeatures Exploration of marker-gene sequence taxonomic annotations

metagenomeFeatures was developed for use in exploring the taxonomic annotations for a marker-gene metagenomic sequence dataset. The package can be used to explore the taxonomic composition of a marker-gene database or annotated sequences from a marker-gene metagenome experiment.

MEAL Perform methylation analysis

Package to integrate methylation and expression data. It can also perform methylation or expression analysis alone. Several plotting functionalities are included as well as a new region analysis based on redundancy analysis. Effect of SNPs on a region can also be estimated.

lfa Logistic Factor Analysis for Categorical Data

LFA is a method for a PCA analogue on Binomial data via estimation of latent structure in the natural parameter.

Imetagene A graphical interface for the metagene package

This package provide a graphical user interface to the metagene package. This will allow people with minimal R experience to easily complete metagene analysis.

iCheck QC Pipeline and Data Analysis Tools for High-Dimensional Illumina mRNA Expression Data

QC pipeline and data analysis tools for high-dimensional Illumina mRNA expression data.

gcatest Genotype Conditional Association TEST

GCAT is an association test for genome wide association studies that controls for population structure under a general class of trait. models.

DAPAR Tools for the Differential Analysis of Proteins Abundance with R

This package contains a collection of functions for the visualisation and the statistical analysis of proteomic data.

TarSeqQC TARgeted SEQuencing Experiment Quality Control

The package allows the representation of targeted experiment in R. This is based on current packages and incorporates functions to do a quality control over this kind of experiments and a fast exploration of the sequenced regions. An xlsx file is generated as output.

Guitar Guitar

The package is designed for visualization of RNA-related genomic features with respect to the landmarks of RNA transcripts, i.e., transcription starting site, start codon, stop codon and transcription ending site.

FindMyFriends Microbial Comparative Genomics in R

A framework for doing microbial comparative genomics in R. The main purpose of the package is assisting in the creation of pangenome matrices where genes from related organisms are grouped by similarity, as well as the analysis of these data. FindMyFriends provides many novel approaches to doing pangenome analysis and supports a gene grouping algorithm that scales linearly, thus making the creation of huge pangenomes feasible.

EnrichedHeatmap Making Enriched Heatmaps

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement Enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.

dupRadar Assessment of duplication rates in RNA-Seq datasets

Duplication rate quality control for RNA-Seq datasets.

DNABarcodes A tool for creating and analysing DNA barcodes used in Next Generation Sequencing multiplexing experiments

The package offers a function to create DNA barcode sets capable of correcting insertion, deletion, and substitution errors. Existing barcodes can be analysed regarding their minimal, maximal and average distances between barcodes. Finally, reads that start with a (possibly mutated) barcode can be demultiplexed, i.e., assigned to their original reference barcode.

DiffLogo DiffLogo: A comparative visualisation of sequence motifs

DiffLogo is an easy-to-use tool to visualize motif differences.

RTCGA The Cancer Genome Atlas Data Integration

The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care. RTCGA package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have an benefcial infuence on impact on development of science and improvement of patients' treatment. Furthermore, RTCGA package transforms TCGA data to tidy form which is convenient to use.

ProteomicsAnnotationHubData Transform public proteomics data resources into Bioconductor Data Structures

These recipes convert a variety and a growing number of public proteomics data sets into easily-used standard Bioconductor data structures.

motifbreakR A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 22).

LOLA Location overlap analysis for enrichment of genomic ranges

Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This make is possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.

iGC An integrated analysis package of Gene expression and Copy number alteration

This package is intended to identify differentially expressed genes driven by Copy Number Alterations from samples with both gene expression and CNA data.

AnnotationHubData Transform public data resources into Bioconductor Data Structures

These recipes convert a wide variety and a growing number of public bioinformatic data sets into easily-used standard Bioconductor data structures.

sbgr R Client for Seven Bridges Genomics API

R client for Seven Bridges Genomics API.

GEOsearch GEOsearch

GEOsearch is an extendable search engine for NCBI GEO (Gene Expression Omnibus). Instead of directly searching the term, GEOsearch can find all the gene names contained in the search term and search all the alias of the gene names simultaneously in GEO database. GEOsearch also provides other functions such as summarizing common biology keywords in the search results.

ldblock data structures for linkage disequilibrium measures in populations

Define data structures for linkage disequilibrium measures in populations.

Path2PPI Prediction of pathway-related protein-protein interaction networks

Package to predict protein-protein interaction (PPI) networks in target organisms for which only a view information about PPIs is available. PATH2PPI predicts PPI networks based on sets of proteins which can belong to a certain pathway from well-established model organisms. It helps to combine and transfer information of a certain pathway or biological process from several reference organisms to one target organism. Path2PPI only depends on the sequence similarity of the involved proteins.

myvariant Accesses variant query and annotation services is a comprehensive aggregation of variant annotation resources. myvariant is a wrapper for querying services

ChIPComp Quantitative comparison of multiple ChIP-seq datasets

ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control.

TCGAbiolinks TCGAbiolinks: An R/Bioconductor package for integrative analysis with TCGA data

The aim of TCGAbiolinks is : i) facilitate the TCGA open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) allow the user to download a specific version of the data and thus to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

ABAEnrichment Gene expression enrichment in human brain regions

The package ABAEnrichment is designed to test for enrichment of user defined candidate genes in the set of expressed genes in different human brain regions. The core function 'aba_enrich' integrates the expression of the candidate gene set (averaged across donors) and the structural information of the brain using an ontology, both provided by the Allen Brain Atlas project. 'aba_enrich' interfaces the ontology enrichment software FUNC to perform the statistical analyses. Additional functions provided in this package like 'get_expression' and 'plot_expression' facilitate exploring the expression data.

synlet Hits Selection for Synthetic Lethal RNAi Screen Data

Select hits from synthetic lethal RNAi screen data. For example, there are two identical celllines except one gene is knocked-down in one cellline. The interest is to find genes that lead to stronger lethal effect when they are knocked-down further by siRNA. Quality control and various visualisation tools are implemented. Four different algorithms could be used to pick up the interesting hits. This package is designed based on 384 wells plates, but may apply to other platforms with proper configuration.

NanoStringDiff Differential Expression Analysis of NanoString nCounter Data

This Package utilizes a generalized linear model(GLM) of the negative binomial family to characterize count data and allows for multi-factor design. NanoStrongDiff incorporate size factors, calculated from positive controls and housekeeping controls, and background level, obtained from negative controls, in the model framework so that all the normalization information provided by NanoString nCounter Analyzer is fully utilized.

metaX An R package for metabolomic data analysis

The package provides a integrated pipeline for mass spectrometry- based metabolomic data analysis. It includes the stages peak detection, data preprocessing, normalization, missing value imputation, univariate statistical analysis, multivariate statistical analysis such as PCA and PLS-DA, metabolite identification, pathway analysis, power analysis, feature selection and modeling, data quality assessment.

eudysbiome pseudo-cartesian plot and contingency test on 16S Microbial data

eudysbiome a package that permits to annotate the differential genera as harmful/harmless based on their ability to contribute to host diseases (as indicated in literature) or unknown based on their ambiguous genus classification. Further, the package statistically measures the eubiotic (harmless genera increase or harmful genera decrease) or dysbiotic(harmless genera decrease or harmful genera increase) impact of a given treatment or environmental change on the (gut-intestinal, GI) microbiome in comparison to the microbiome of the reference condition.

Oscope Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq

Oscope is a statistical pipeline developed to identifying and recovering the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.

variancePartition Quantify and interpret divers of variation in multilevel gene expression experiments

Quantify and interpret multiple sources and biological and technical variation in gene expression experiments. Uses linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables.

destiny Creates diffusion maps

Create and plot diffusion maps

HilbertCurve Making 2D Hilbert Curve

Hilbert curve is a type of space-filling curves that fold one dimensional axis into a two dimensional space, but with still keep the locality. This package aims to provide a easy and flexible way to visualize data through Hilbert curve.

LedPred Learning from DNA to Predict enhancers

This package aims at creating a predictive model of regulatory sequences used to score unknown sequences based on the content of DNA motifs, next-generation sequencing (NGS) peaks and signals and other numerical scores of the sequences using supervised classification. The package contains a workflow based on the support vector machine (SVM) algorithm that maps features to sequences, optimize SVM parameters and feature number and creates a model that can be stored and used to score the regulatory potential of unknown sequences.

traseR GWAS trait-associated SNP enrichment analyses in genomic intervals

traseR performs GWAS trait-associated SNP enrichment analyses in genomic intervals using different hypothesis testing approaches, also provides various functionalities to explore and visualize the results.

OGSA Outlier Gene Set Analysis

OGSA provides a global estimate of pathway deregulation in cancer subtypes by integrating the estimates of significance for individual pathway members that have been identified by outlier analysis.

miRLAB Dry lab for exploring miRNA-mRNA relationships

Provide tools exploring miRNA-mRNA relationships, including popular miRNA target prediction methods, ensemble methods that integrate individual methods, functions to get data from online resources, functions to validate the results, and functions to conduct enrichment analyses.

genotypeeval QA/QC of a gVCF or VCF file

Takes in a gVCF or VCF and reports metrics to assess quality of calls.

ropls PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data

Latent variable modeling with Principal Component Analysis (PCA) and Partial Least Squares (PLS) are powerful methods for visualization, regression, classification, and feature selection of omics data where the number of variables exceeds the number of samples and with multicollinearity among variables. Orthogonal Partial Least Squares (OPLS) enables to separately model the variation correlated (predictive) to the factor of interest and the uncorrelated (orthogonal) variation. While performing similarly to PLS, OPLS facilitates interpretation. Successful applications of these chemometrics techniques include spectroscopic data such as Raman spectroscopy, nuclear magnetic resonance (NMR), mass spectrometry (MS) in metabolomics and proteomics, but also transcriptomics data. In addition to scores, loadings and weights plots, the package provides metrics and graphics to determine the optimal number of components (e.g. with the R2 and Q2 coefficients), check the validity of the model by permutation testing, detect outliers, and perform feature selection (e.g. with Variable Importance in Projection or regression coefficients). The package can be accessed via a user interface on the online resource for computational metabolomics (built upon the Galaxy environment).

rCGH Comprehensive Pipeline for Analyzing and Visualizing Array-Based CGH Data

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data is in a suitable format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to segmenting and annotating genes. This package provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.


DEMAND predicts Drug MoA by interrogating a cell context specific regulatory network with a small number (N >= 6) of compound-induced gene expression signatures, to elucidate specific proteins whose interactions in the network is dysregulated by the compound.

rnaseqcomp Benchmark for RNA-seq Quantification Pipelines

Several quantitative and visualized benchmarks for RNA-seq quantification pipelines. Two-replicate quantifications for genes, transcripts, junctions or exons by each pipeline with nessasery meta information should be organizd into numeric matrix in order to proceed the evaluation.

INSPEcT Analysis of 4sU-seq and RNA-seq time-course data

INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-Course experiments) analyses 4sU-seq and RNA-seq time-course data in order to evaluate synthesis, processing and degradation rates and asses via modeling the rates that determines changes in mature mRNA levels.

Prize Prize: an R package for prioritization estimation based on analytic hierarchy process

The high throughput studies often produce large amounts of numerous genes and proteins of interest. While it is difficult to study and validate all of them. Analytic Hierarchy Process (AHP) offers a novel approach to narrowing down long lists of candidates by prioritizing them based on how well they meet the research goal. AHP is a mathematical technique for organizing and analyzing complex decisions where multiple criteria are involved. The technique structures problems into a hierarchy of elements, and helps to specify numerical weights representing the relative importance of each element. Numerical weight or priority derived from each element allows users to find alternatives that best suit their goal and their understanding of the problem.

XBSeq Test for differential expression for RNA-seq data

We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes.

CNVPanelizer Reliable CNV detection in targeted sequencing applications

A method that allows for the use of a collection of non-matched normal tissue samples. Our approach uses a non-parametric bootstrap subsampling of the available reference samples to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined with a procedure that subsamples the amplicons associated with each of the targeted genes. The obtained information allows us to reliably classify the copy number aberrations on the gene level.

fCI f-divergence Cutoff Index

(f-divergence Cutoff Index), is to find DEGs in the transcriptomic & proteomic data, and identify DEGs by computing the difference between the distribution of fold-changes for the control-control and remaining (non-differential) case-control gene expression ratio data. fCI provides several advantages compared to existing methods.

IONiseR Quality Assessment Tools for Oxford Nanopore MinION data

IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.

erma epigenomic road map adventures

Software and data to support epigenomic road map adventures.

PGA An package for identification of novel peptides by customized database derived from RNA-Seq

This package provides functions for construction of customized protein databases based on RNA-Seq data, database searching, post-processing and report generation. This kind of customized protein database includes both the reference database (such as Refseq or ENSEMBL) and the novel peptide sequences form RNA-Seq data.

mirIntegrator Integrating microRNA expression into signaling pathways for pathway analysis

Tools for augmenting signaling pathways to perform pathway analysis of microRNA and mRNA expression levels.


Given single-cell RNA-seq data and true experiment time of cells or pseudo-time cell ordering, SEPA provides convenient functions for users to assign genes into different gene expression patterns such as constant, monotone increasing and increasing then decreasing. SEPA then performs GO enrichment analysis to analysis the functional roles of genes with same or similar patterns.

hierGWAS Asessing statistical significance in predictive GWA studies

Testing individual SNPs, as well as arbitrarily large groups of SNPs in GWA studies, using a joint model of all SNPs. The method controls the FWER, and provides an automatic, data-driven refinement of the SNP clusters to smaller groups or single markers.

caOmicsV Visualization of multi-dimentional cancer genomics data

caOmicsV package provides methods to visualize multi-dimentional cancer genomics data including of patient information, gene expressions, DNA methylations, DNA copy number variations, and SNP/mutations in matrix layout or network layout.

RareVariantVis Visualization of rare variants in whole genome sequencing data

Genomic variants can be analyzed and visualized using many tools. Unfortunately, number of tools for global interrogation of variants is limited. Package RareVariantVis aims to present genomic variants (especially rare ones) in a global, per chromosome way. Visualization is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases.

OperaMate An R package of Data Importing, Processing and Analysis for Opera High Content Screening System

OperaMate is a flexible R package dealing with the data generated by PerkinElmer's Opera High Content Screening System. The functions include the data importing, normalization and quality control, hit detection and function analysis.

acde Artificial Components Detection of Differentially Expressed Genes

This package provides a multivariate inferential analysis method for detecting differentially expressed genes in gene expression data. It uses artificial components, close to the data's principal components but with an exact interpretation in terms of differential genetic expression, to identify differentially expressed genes while controlling the false discovery rate (FDR). The methods on this package are described in the vignette or in the article 'Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments' by J. P. Acosta, L. Lopez-Kleine and S. Restrepo (2015, pending publication).

RTCGAToolbox A new tool for exporting TCGA Firehose data

Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.

SummarizedExperiment SummarizedExperiment container

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

BEclear Correct for batch effects in DNA methylation data

Provides some functions to detect and correct for batch effects in DNA methylation data. The core function "BEclear" is based on latent factor models and can also be used to predict missing values in any other matrix containing real numbers.

EMDomics Earth Mover's Distance for Differential Analysis of Genomics Data

The EMDomics algorithm is used to perform a supervised multi-class analysis to measure the magnitude and statistical significance of observed continuous genomics data between groups. Usually the data will be gene expression values from array-based or sequence-based experiments, but data from other types of experiments can also be analyzed (e.g. copy number variation). Traditional methods like Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA) use significance tests based on summary statistics (mean and standard deviation) of the distributions. This approach lacks power to identify expression differences between groups that show high levels of intra-group heterogeneity. The Earth Mover's Distance (EMD) algorithm instead computes the "work" needed to transform one distribution into another, thus providing a metric of the overall difference in shape between two distributions. Permutation of sample labels is used to generate q-values for the observed EMD scores. This package also incorporates the Komolgorov-Smirnov (K-S) test and the Cramer von Mises test (CVM), which are both common distribution comparison tests.

edge Extraction of Differential Gene Expression

The edge package implements methods for carrying out differential expression analyses of genome-wide gene expression studies. Significance testing using the optimal discovery procedure and generalized likelihood ratio tests (equivalent to F-tests and t-tests) are implemented for general study designs. Special functions are available to facilitate the analysis of common study designs, including time course experiments. Other packages such as snm, sva, and qvalue are integrated in edge to provide a wide range of tools for gene expression analysis.

pwOmics Pathway-based data integration of omics data

pwOmics performs pathway-based level-specific data comparison of matching omics data sets based on pre-analysed user-specified lists of differential genes/transcripts and proteins. A separate downstream analysis of proteomic data including pathway identification and enrichment analysis, transcription factor identification and target gene identification is opposed to the upstream analysis starting with gene or transcript information as basis for identification of upstream transcription factors and regulators. The cross-platform comparative analysis allows for comprehensive analysis of single time point experiments and time-series experiments by providing static and dynamic analysis tools for data integration.

similaRpeak Metrics to estimate a level of similarity between two ChIP-Seq profiles

This package calculates metrics which assign a level of similarity between ChIP-Seq profiles.

msa Multiple Sequence Alignment

This package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

RnBeads RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

flowVS Variance stabilization in flow cytometry (and microarrays)

Per-channel variance stabilization from a collection of flow cytometry samples by Bertlett test for homogeneity of variances. The approach is applicable to microarrays data as well.

ENCODExplorer A compilation of ENCODE metadata

This package allows user to quickly access ENCODE project files metadata and give access to helper functions to query the ENCODE rest api, download ENCODE datasets and save the database in SQLite format.

CAnD Perform Chromosomal Ancestry Differences (CAnD) Analyses

Functions to perform the CAnD test on a set of ancestry proportions. For a particular ancestral subpopulation, a user will supply the estimated ancestry proportion for each sample, and each chromosome or chromosomal segment of interest. A p-value for each chromosome as well as an overall CAnD p-value will be returned for each test. Plotting functions are also available.

diffHic Differential analyis of Hi-C data

Detects differential interactions across biological conditions in a Hi-C experiment. Methods are provided for read alignment and data pre-processing into interaction counts. Statistical analysis is based on edgeR and supports normalization and filtering. Several visualization options are also available.

FlowRepositoryR FlowRepository R Interface

This package provides an interface to search and download data and annotations from FlowRepository ( It uses the FlowRepository programming interface to communicate with a FlowRepository server.

R3CPET 3CPET: Finding Co-factor Complexes in Chia-PET experiment using a Hierarchical Dirichlet Process

The package provides a method to infer the set of proteins that are more probably to work together to maintain chormatin interaction given a ChIA-PET experiment results.

pandaR PANDA algorithm

Runs PANDA, an algorithm for discovering novel network structure by combining information from multiple complimentary data sources.

ENmix Data preprocessing and quality control for Illumina HumanMethylation450 BeadChip

Illumina HumanMethylation450 BeadChip array measurements have intrinsic levels of background noise that degrade methylation measurement. The ENmix package provides an efficient data pre-processing tool designed to reduce background noise and improve signal for DNA methylation estimation. The package utilizes a novel model-based background correction method, ENmix, that significantly improve accuracy and reproducibility of methylation measures. The data structure used by the ENmix package is compatible with several other related R packages, such as minfi, wateRmelon and ChAMP, providing straightforward integration of ENmix-corrected datasets for subsequent data analysis. The software is designed to support large scale data analysis, and provides multi-processor parallel computing wrappers for commonly used data preprocessing methods, including BMIQ probe design type bias correction and ComBat batch effect correction. In addition ENmix package has selectable complementary functions for efficient data visualization (such as data distribution plotting), quality control (identification and filtering of low quality data points, samples, probes, and outliers, along with imputation of missing values), inter-array normalization (3 different quantile normalizations), identification of probes with multimodal distributions due to SNPs and other factors, and exploration of data variance structure using principal component regression analysis plots. Together these provide a set of flexible and transparent tools for preprocessing of EWAS data in a computationally-efficient and user-friendly package.

soGGi Visualise ChIP-seq, MNase-seq and motif occurrence as aggregate plots Summarised Over Grouped Genomic Intervals

The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.

MethTargetedNGS Perform Methylation Analysis on Next Generation Sequencing Data

Perform step by step methylation analysis of Next Generation Sequencing data.

conumee Enhanced copy-number variation analysis using Illumina 450k methylation arrays

This package contains a set of processing and plotting methods for performing copy-number variation (CNV) analysis using Illumina 450k methylation arrays.

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:


Development Version »

Bioconductor packages under development:

Developer Resources:

Fred Hutchinson Cancer Research Center