This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

OperaMate An R package of Data Importing, Processing and Analysis for Opera High Content Screening System

OperaMate is a flexible R package dealing with the data generated by PerkinElmer's Opera High Content Screening System. The functions include the data importing, normalization and quality control, hit detection and function analysis.

acde Artificial Components Detection of Differentially Expressed Genes

This package provides a multivariate inferential analysis method for detecting differentially expressed genes in gene expression data. It uses artificial components, close to the data's principal components but with an exact interpretation in terms of differential genetic expression, to identify differentially expressed genes while controlling the false discovery rate (FDR). The methods on this package are described in the vignette or in the article 'Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments' by J. P. Acosta, L. Lopez-Kleine and S. Restrepo (2015, pending publication).

CausalR Causal Reasoning on Biological Networks

Causal Reasoning algorithms for biological networks, including predictions, scoring, p-value calculation and ranking

RTCGAToolbox A new tool for exporting TCGA Firehose data

Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.

SummarizedExperiment SummarizedExperiment container

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

BEclear Correct for batch effects in DNA methylation data

Provides some functions to detect and correct for batch effects in DNA methylation data. The core function "BEclear" is based on latent factor models and can also be used to predict missing values in any other matrix containing real numbers.

EMDomics Earth Mover's Distance for Differential Analysis of Genomics Data

The EMDomics algorithm is used to perform a supervised two-class analysis to measure the magnitude and statistical significance of observed continuous genomics data between two groups. Usually the data will be gene expression values from array-based or sequence-based experiments, but data from other types of experiments can also be analyzed (e.g. copy number variation). Traditional methods like Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA) use significance tests based on summary statistics (mean and standard deviation) of the two distributions. This approach lacks power to identify expression differences between groups that show high levels of intra-group heterogeneity. The Earth Mover's Distance (EMD) algorithm instead computes the "work" needed to transform one distribution into the other, thus providing a metric of the overall difference in shape between two distributions. Permutation of sample labels is used to generate q-values for the observed EMD scores.

edge Extraction of Differential Gene Expression

The edge package implements methods for carrying out differential expression analyses of genome-wide gene expression studies. Significance testing using the optimal discovery procedure and generalized likelihood ratio tests (equivalent to F-tests and t-tests) are implemented for general study designs. Special functions are available to facilitate the analysis of common study designs, including time course experiments. Other packages such as snm, sva, and qvalue are integrated in edge to provide a wide range of tools for gene expression analysis.

pwOmics Pathway-based data integration of omics data

pwOmics performs pathway-based level-specific data comparison of matching omics data sets based on pre-analysed user-specified lists of differential genes/transcripts and proteins. A separate downstream analysis of proteomic data including pathway identification and enrichment analysis, transcription factor identification and target gene identification is opposed to the upstream analysis starting with gene or transcript information as basis for identification of upstream transcription factors and regulators. The cross-platform comparative analysis allows for comprehensive analysis of single time point experiments and time-series experiments by providing static and dynamic analysis tools for data integration.

similaRpeak similaRpeak: Metrics to estimate a level of similarity between two ChIP-Seq profiles

This package calculates metrics which assign a level of similarity between ChIP-Seq profiles.

msa Multiple Sequence Alignment

This package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

RnBeads RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

flowVS Variance stabilization in flow cytometry (and microarrays)

Per-channel variance stabilization from a collection of flow cytometry samples by Bertlett test for homogeneity of variances. The approach is applicable to microarrays data as well.

ENCODExplorer A compilation of ENCODE metadata

This package allows user to quickly access ENCODE project files metadata and give access to helper functions to query the ENCODE rest api, download ENCODE datasets and save the database in SQLite format.

CAnD Perform Chromosomal Ancestry Differences (CAnD) Analyses

Functions to perform the non-parametric and parametric CAnD tests on a set of ancestry proportions. For a particular ancestral subpopulation, a user will supply the estimated ancestry proportion for each sample, and each chromosome or chromosomal segment of interest. A p-value for each chromosome as well as an overall CAnD p-value will be returned for each test. Plotting functions are also available.

diffHic Differential analyis of Hi-C data

Detects differential interactions across biological conditions in a Hi-C experiment. Methods are provided for read alignment and data pre-processing into interaction counts. Statistical analysis is based on edgeR and supports normalization and filtering. Several visualization options are also available.

FlowRepositoryR FlowRepository R Interface

This package provides an interface to search and download data and annotations from FlowRepository (flowrepository.org). It uses the FlowRepository programming interface to communicate with a FlowRepository server.

R3CPET 3CPET: Finding Co-factor Complexes in Chia-PET experiment using a Hierarchical Dirichlet Process

The package provides a method to infer the set of proteins that are more probably to work together to maintain chormatin interaction given a ChIA-PET experiment results.

pandaR PANDA algorithm

Runs PANDA, an algorithm for discovering novel network structure by combining information from multiple complimentary data sources.

ENmix Data preprocessing and quality control for Illumina HumanMethylation450 BeadChip

Illumina HumanMethylation450 BeadChip has a complex array design, and the measurement is subject to experimental variations. The ENmix R package provides tools for low level data preprocessing to improve data quality. It incorporates a model based background correction method ENmix, and provides functions for inter-array quantile normalization, data quality checking, exploration of multimodally distributed CpGs and source of data variation. To support large scale data analysis, the package also provides multi-processor parallel computing wrappers for some commonly used data preprocessing methods, such as BMIQ probe design type bias correction and ComBat batch effect correction.

soGGi Visualise ChIP-seq, MNase-seq and motif occurrence as aggregate plots Summarised Over Grouped Genomic Intervals

The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.

MethTargetedNGS Perform Methylation Analysis on Next Generation Sequencing Data

Perform step by step methylation analysis of Next Generation Sequencing data.

conumee Enhanced copy-number variation analysis using Illumina 450k methylation arrays

This package contains a set of processing and plotting methods for performing copy-number variation (CNV) analysis using Illumina 450k methylation arrays.

RCyjs Display and manipulate graphs in Cytoscape.js

Interactvive viewing and exploration of graphs, connecting R to Cytoscape.js

TPP Analyze thermal proteome profiling (TPP) experiments

Analyze thermal proteome profiling (TPP) experiments with varying temperatures (TR) or compound concentrations (CCR).

NanoStringQCPro Quality metrics and data processing methods for NanoString mRNA gene expression data

NanoStringQCPro provides a set of quality metrics that can be used to assess the quality of NanoString mRNA gene expression data -- i.e. to identify outlier probes and outlier samples. It also provides different background subtraction and normalization approaches for this data. It outputs suggestions for flagging samples/probes and an easily sharable html quality control output.

GoogleGenomics R Client for Google Genomics API

Provides an R package to interact with the Google Genomics API.

CopywriteR Copy number information from targeted sequencing using off-target reads

CopywriteR extracts DNA copy number information from targeted sequencing by utiizing off-target sequence reads. It allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. Thereby, CopywriteR constitutes a widely applicable alternative to available tools.

cogena co-expressed gene-set enrichment analysis

Description: Gene set enrichment analysis is a valuable tool for the study of molecular mechanisms that underpin complex biological traits. As the method is conventionally used on entire omic datasets, such as transcriptomes, it may be dominated by pathways and processes that are substantially represented in a dataset, however the approach may overlook smaller scale, but highly correlated cellular events that may be of great biological relevance. In order to detect these discrete molecular triggers, we developed a tool, co-expressed gene-set enrichment analysis (cogena), for clustering differentially expressed genes and identification of highly correlated molecular expression clusters. Cogena offers the user a range of clustering methods, including hierarchical clustering, model based clustering and self-organised mapping, based on different distance metrics like correlation and mutual information. After obtaining and visualising clusters, cogena performs gene set enrichment. These gene sets can be sourced from the Molecular Signatures Database (MSigDB) or user-defined gene sets. By performing gene set enrichment across expression clusters, we find considerable enhancement in the resolution of molecular signatures in omic data at the cluster level compared to the whole.

BrowserVizDemo BrowserVizDemo: How to subclass BrowserViz

A BrowserViz subclassing example, xy plotting in the browser using d3

SVM2CRM SVM2CRM: support vector machine for cis-regulatory elements detections

Detection of cis-regulatory elements using svm implemented in LiblineaR.

RUVcorr Removal of unwanted variation for gene-gene correlations and related analysis

RUVcorr allows to apply global removal of unwanted variation (ridged version) to real and simulated gene expression data.

OmicsMarkeR Classification and Feature Selection for 'Omics' Datasets

Tools for classification and feature selection for 'omics' level datasets. It is a tool to provide multiple multivariate classification and feature selection techniques complete with multiple stability metrics and aggregation techniques. It is primarily designed for analysis of metabolomics datasets but potentially extendable to proteomics and transcriptomics applications.

mogsa Multiple omics data integartion and gene set analysis

This package provide a method for doing gene set analysis based on multiple omics data.

FISHalyseR FISHalyseR a package for automated FISH quantification

FISHalyseR provides functionality to process and analyse digital cell culture images, in particular to quantify FISH probes within nuclei. Furthermore, it extract the spatial location of each nucleus as well as each probe enabling spatial co-localisation analysis.

DMRcaller Differentially Methylated Regions caller

Uses Bisulfite sequencing data in two conditions and identifies differentially methylated regions between the conditions in CG and non-CG context. The input is the CX report files produced by Bismark and the output is a list of DMRs stored as GRanges objects.

regioneR Association analysis of genomic regions based on permutation tests

regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.

pmm Parallel Mixed Model

The Parallel Mixed Model (PMM) approach is suitable for hit selection and cross-comparison of RNAi screens generated in experiments that are performed in parallel under several conditions. For example, we could think of the measurements or readouts from cells under RNAi knock-down, which are infected with several pathogens or which are grown from different cell lines.

ComplexHeatmap Making Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential features. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports self-defined annotation graphics.

RBM RBM: a R package for microarray and RNA-Seq data analysis

Use A Resampling-Based Empirical Bayes Approach to Assess Differential Expression in Two-Color Microarrays and RNA-Seq data sets.

podkat Position-Dependent Kernel Association Test

This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.

LowMACA LowMACA - Low frequency Mutation Analysis via Consensus Alignment

The LowMACA package is a simple suite of tools to investigate and analyze the mutation profile of several proteins or pfam domains via consensus alignment. You can conduct an hypothesis driven exploratory analysis using our package simply providing a set of genes or pfam domains of your interest.

rcellminer rcellminer: Molecular Profiles and Drug Response for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

gtrellis Genome Level Trellis Layout

Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and add self-defined graphics in the plot.

ensembldb Utilities to create and use an Ensembl based annotation database

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, the ensembldb package provides also a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes.

TIN Transcriptome instability analysis

The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.

InPAS Identification of Novel alternative PolyAdenylation Sites (PAS)

Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites from RNAseq data. It leverages cleanUpdTSeq to fine tune identified APA sites.

GENESIS GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015): a Principal Components Analysis with genome-wide SNP genotype data for robust population structure inference in samples with related individuals (known or cryptic).

bamsignals Extract read count signals from bam files

This package allows to efficiently obtain count vectors from indexed bam files. It counts the number of reads in given genomic ranges and it computes reads profiles and coverage profiles. It also handles paired-end data.

RNAprobR An R package for analysis of massive parallel sequencing based RNA structure probing data

This package facilitates analysis of Next Generation Sequencing data for which positional information with a single nucleotide resolution is a key. It allows for applying different types of relevant normalizations, data visualization and export in a table or UCSC compatible bedgraph file.

netbenchmark Benchmarking of several gene network inference methods

This package implements a benchmarking of several gene network inference algorithms from gene expression data.

MatrixRider Obtain total affinity and occupancies for binding site matrices on a given sequence

Calculates a single number for a whole sequence that reflects the propensity of a DNA binding protein to interact with it. The DNA binding protein has to be described with a PFM matrix, for example gotten from Jaspar.

LEA LEA: an R package for Landscape and Ecological Association Studies

LEA is an R package dedicated to landscape genomics and ecological association tests. LEA can run analyses of population structure and genome scans for local adaptation. It includes statistical methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (snmf, pca); and identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (lfmm), and controlling the false discovery rate. LEA is mainly based on optimized C programs that can scale with the dimension of very large data sets.

immunoClust immunoClust - Automated Pipeline for Population Detection in Flow Cytometry

Model based clustering and meta-clustering of Flow Cytometry Data

diggit Inference of Genetic Variants Driving Cellular Phenotypes

Inference of Genetic Variants Driving Cellullar Phenotypes by the DIGGIT algorithm

canceR A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC.

The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).

muscle Multiple Sequence Alignment with MUSCLE

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

BrowserViz BrowserViz: interactive R/browser graphics using websockets and JSON

Interactvive graphics in a web browser from R, using websockets and JSON

Rhtslib HTSlib high-throughput sequencing library as an R package

This package provides version 1.1 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").

skewr Visualize Intensities Produced by Illumina's Human Methylation 450k BeadChip

The skewr package is a tool for visualizing the output of the Illumina Human Methylation 450k BeadChip to aid in quality control. It creates a panel of nine plots. Six of the plots represent the density of either the methylated intensity or the unmethylated intensity given by one of three subsets of the 485,577 total probes. These subsets include Type I-red, Type I-green, and Type II.The remaining three distributions give the density of the Beta-values for these same three subsets. Each of the nine plots optionally displays the distributions of the "rs" SNP probes and the probes associated with imprinted genes as series of 'tick' marks located above the x-axis.

sigsquared Gene signature generation for functionally validated signaling pathways

By leveraging statistical properties (log-rank test for survival) of patient cohorts defined by binary thresholds, poor-prognosis patients are identified by the sigsquared package via optimization over a cost function reducing type I and II error.

SELEX Functions for analyzing SELEX-seq data

Tools for quantifying DNA binding specificities based on SELEX-seq data

ProtGenerics S4 generic functions for Bioconductor proteomics infrastructure

S4 generic functions needed by Bioconductor proteomics packages.

BubbleTree A method to elucidate purity and clonality in tumors using copy number ratio and allele frequency

BubbleTree utilizes homogenous pertinent somatic copy number alterations (SCNAs) as markers of tumor clones to extract estimates of tumor ploidy, purity and clonality.

rGREAT Client for GREAT Analysis

This package makes GREAT (Genomic Regions Enrichment of Annotations Tool) analysis automatic by constructing a HTTP POST request according to user's input and automatically retrieving results from GREAT web server.

birte Bayesian Inference of Regulatory Influence on Expression (biRte)

Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. biRte uses regulatory networks of TFs, miRNAs and possibly other factors, together with mRNA, miRNA and other available expression data to predict the relative influence of a regulator on the expression of its target genes. Inference is done in a Bayesian modeling framework using Markov-Chain-Monte-Carlo. A special feature is the possibility for follow-up network reverse engineering between active regulators.

HIBAG HLA Genotype Imputation with Attribute Bagging

It is a software package for imputing HLA types using SNP data, and relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

sincell R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data

Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.

Cardinal A mass spectrometry imaging toolbox for statistical analysis

Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.

GreyListChIP Grey Lists -- Mask Artefact Regions Based on ChIP Inputs

Identify regions of ChIP experiments with high signal in the input, that lead to spurious peaks during peak calling. Remove reads aligning to these regions prior to peak calling, for cleaner ChIP analysis.

IVAS Identification of genetic Variants affecting Alternative Splicing

Identification of genetic variants affecting alternative splicing.

cytofkit cytofkit: an integrated analysis pipeline for mass cytometry data

An integrated mass cytometry data analysis pipeline that enables simultaneous illustration of cellular diversity and progression.

seq2pathway a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data

Seq2pathway is a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data, consisting of "seq2gene" and "gene2path" components. The seq2gene links sequence-level measurements of genomic regions (including SNPs or point mutation coordinates) to gene-level scores, and the gene2pathway summarizes gene scores to pathway-scores for each sample. The seq2gene has the feasibility to assign both coding and non-exon regions to a broader range of neighboring genes than only the nearest one, thus facilitating the study of functional non-coding regions. The gene2pathway takes into account the quantity of significance for gene members within a pathway compared those outside a pathway. The output of seq2pathway is a general structure of quantitative pathway-level scores, thus allowing one to functional interpret such datasets as RNA-seq, ChIP-seq, GWAS, and derived from other next generational sequencing experiments.

ggtree a phylogenetic tree viewer for different types of tree annotations

ggtree extends the ggplot2 plotting system which implemented the grammar of graphics. ggtree is designed for visualizing phylogenetic tree and different types of associated annotation data.

parglms support for parallelized estimation of GLMs/GEEs

support for parallelized estimation of GLMs/GEEs, catering for dispersed data

seqPattern Visualising oligonucleotide patterns and motif occurrences across a set of sorted sequences

Visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature.

MeSHSim MeSH(Medical Subject Headings) Semantic Similarity Measures

Provide for measuring semantic similarity over MeSH headings and MEDLINE documents

mAPKL A Hybrid Feature Selection method for gene expression data

We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes.

gdsfmt R Interface to CoreArray Genomic Data Structure (GDS) Files

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms and include hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers with less than 8 bits, since a single genetic/genomic variant, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are also supported with relatively efficient random access. It is allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

TRONCO TRONCO, a package for TRanslational ONCOlogy

Genotype-level cancer progression models describe the ordering of accumulating mutations, e.g., somatic mutations / copy number variations, during cancer development. These graphical models help understand the causal structure involving events promoting cancer progression, possibly predicting complex patterns characterising genomic progression of a cancer. Reconstructed models can be used to better characterise genotype-phenotype relation, and suggest novel targets for therapy design. TRONCO (TRanslational ONCOlogy) is a R package aimed at collecting state-of-the-art algorithms to infer progression models from cross-sectional data, i.e., data collected from independent patients which does not necessarily incorporate any evident temporal information. These algorithms require a binary input matrix where: (i) each row represents a patient genome, (ii) each column an event relevant to the progression (a priori selected) and a 0/1 value models the absence/presence of a certain mutation in a certain patient. The current first version of TRONCO implements the CAPRESE algorithm (Cancer PRogression Extraction with Single Edges) to infer possible progression models arranged as trees; cfr. Inferring tree causal models of cancer progression with probability raising, L. Olde Loohuis, G. Caravagna, A. Graudenzi, D. Ramazzotti, G. Mauri, M. Antoniotti and B. Mishra. PLoS One, to appear. This vignette shows how to use TRONCO to infer a tree model of ovarian cancer progression from CGH data of copy number alterations (classified as gains or losses over chromosome's arms). The dataset used is available in the SKY/M-FISH database.

RnaSeqSampleSize RnaSeqSampleSize

RnaSeqSampleSize package provides a sample size calculation method based on negative binomial model and the exact test for assessing differential expression analysis of RNA-seq data

gespeR Gene-Specific Phenotype EstimatoR

Estimates gene-specific phenotypes from off-target confounded RNAi screens. The phenotype of each siRNA is modeled based on on-targeted and off-targeted genes, using a regularized linear regression model.

coMET coMET: visualisation of regional epigenome-wide association scan (EWAS) results and DNA co-methylation patterns.

Visualisation of EWAS results in a genomic region. In addition to phenotype-association P-values, coMET also generates plots of co-methylation patterns and provides a series of annotation tracks. It can be used to other omic-wide association scans as long as the data can be translated to genomic level and for any species.

CODEX A Normalization and Copy Number Variation Detection Method for Whole Exome Sequencing

A normalization and copy number variation calling procedure for whole exome DNA sequencing data. CODEX relies on the availability of multiple samples processed using the same sequencing pipeline for normalization, and does not require matched controls. The normalization model in CODEX includes terms that specifically remove biases due to GC content, exon length and targeting and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data.

chromDraw chromDraw an R package for visualization of linear and circular karyotypes.

Package chromDraw is a simple package for linear and circular type of karyotype visualization. The linear type of visualization is usually used for demonstrating chromosomes structures in karyotype and the circular type of visualization is used for comparing of karyotypes between each other. This tool has own input data format or genomicRanges structure can be used as input. Each chromosome containing definition of blocks and centromere position. Output file formats are *.eps and *.svg.

AnalysisPageServer A framework for sharing interactive data and plots from R through the web.

AnalysisPageServer is a modular system that enables sharing of customizable R analyses via the web.

rgsepd Gene Set Enrichment / Projection Displays

R/GSEPD is a bioinformatics package for R to help disambiguate transcriptome samples (a matrix of RNA-Seq counts at RefSeq IDs) by automating differential expression (with DESeq2), then gene set enrichment (with GOSeq), and finally a N-dimensional projection to quantify in which ways each sample is like either treatment group.

mdgsa Multi Dimensional Gene Set Analysis.

Functions to preform a Gene Set Analysis in several genomic dimensions. Including methods for miRNAs.

FlowSOM Using self-organizing maps for visualization and interpretation of cytometry data

FlowSOM offers visualization options for cytometry data, by using Self-Organizing Map clustering and Minimal Spanning Trees

gQTLstats gQTLstats: computationally efficient analysis for eQTL and allied studies

computationally efficient analysis of eQTL, mQTL, dsQTL, etc.

gQTLBase gQTLBase: infrastructure for eQTL, mQTL and similar studies

Infrastructure for eQTL, mQTL and similar studies.

PROPER PROspective Power Evaluation for RNAseq

This package provide simulation based methods for evaluating the statistical power in differential expression analysis from RNA-seq data.

nethet A bioconductor package for high-dimensional exploration of biological network heterogeneity

Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).

cpvSNP Gene set analysis methods for SNP association p-values that lie in genes in given gene sets

Gene set analysis methods exist to combine SNP-level association p-values into gene sets, calculating a single association p-value for each gene set. This package implements two such methods that require only the calculated SNP p-values, the gene set(s) of interest, and a correlation matrix (if desired). One method (GLOSSI) requires independent SNPs and the other (VEGAS) can take into account correlation (LD) among the SNPs. Built-in plotting functions are available to help users visualize results.

QuartPAC Identification of mutational clusters in protein quaternary structures.

Identifies clustering of somatic mutations in proteins over the entire quaternary structure.

saps Significance Analysis of Prognostic Signatures

Functions implementing the Significance Analysis of Prognostic Signatures method (SAPS). SAPS provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. P_pure is calculated as the probability of no survival difference between the two groups. Next, the same procedure is applied to randomly generated gene sets, and P_random is calculated as the proportion achieving a P_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and P_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance. A SAPS_score is calculated to summarize the three statistics, and optionally a Q-value is computed to estimate the significance of the SAPS_score by calculating SAPS_scores for random gene sets.

genomation Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

AIMS AIMS : Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

This package contains the AIMS implementation. It contains necessary functions to assign the five intrinsic molecular subtypes (Luminal A, Luminal B, Her2-enriched, Basal-like, Normal-like). Assignments could be done on individual samples as well as on dataset of gene expression data.

Metab Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS.

Metab is an R package for high-throughput processing of metabolomics data analysed by the Automated Mass Spectral Deconvolution and Identification System (AMDIS) (http://chemdata.nist.gov/mass-spc/amdis/downloads/). In addition, it performs statistical hypothesis test (t-test) and analysis of variance (ANOVA). Doing so, Metab considerably speed up the data mining process in metabolomics and produces better quality results. Metab was developed using interactive features, allowing users with lack of R knowledge to appreciate its functionalities.

pepXMLTab Parsing pepXML files and filter based on peptide FDR.

Parsing pepXML files based one XML package. The package tries to handle pepXML files generated from different softwares. The output will be a peptide-spectrum-matching tabular file. The package also provide function to filter the PSMs based on FDR.

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version »

Bioconductor packages under development:


Developer Resources:

Fred Hutchinson Cancer Research Center