This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

TIN Transcriptome instability analysis

The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.

InPAS Identification of Novel alternative PolyAdenylation Sites (PAS)

Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanism which occurs in most human genes. InPAS, developed form DaPars algorithm, predicts and estimates APA and cleavage sites for mRNA-seq data. It uses the power of cleanUpdTSeq to adjust cleavage sites.

GENESIS GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015): a Principal Components Analysis with genome-wide SNP genotype data for robust population structure inference in samples with related individuals (known or cryptic).

bamsignals Extract read count signals from bam files

This package allows to efficiently obtain count vectors from indexed bam files. It counts the number of reads in given genomic ranges and it computes reads profiles and coverage profiles. It also handles paired-end data.

SIMAT GC-SIM-MS data processing and alaysis tool

This package provides a pipeline for analysis of GC-MS data acquired in selected ion monitoring (SIM) mode. The tool also provides a guidance in choosing appropriate fragments for the targets of interest by using an optimization algorithm. This is done by considering overlapping peaks from a provided library by the user.

RNAprobR An R package for analysis of massive parallel sequencing based RNA structure probing data

This package facilitates analysis of Next Generation Sequencing data for which positional information with a single nucleotide resolution is a key. It allows for applying different types of relevant normalizations, data visualization and export in a table or UCSC compatible bedgraph file.

netbenchmark Benchmarking of several gene network inference methods

This package implements a benchmarking of several gene network inference algorithms from gene expression data.

MatrixRider Obtain total affinity and occupancies for binding site matrices on a given sequence

Calculates a single number for a whole sequence that reflects the propensity of a DNA binding protein to interact with it. The DNA binding protein has to be described with a PFM matrix, for example gotten from Jaspar.

LEA LEA: an R package for Landscape and Ecological Association Studies

LEA is an R package dedicated to landscape genomics and ecological association tests. LEA can run analyses of population structure and genome scans for local adaptation. It includes statistical methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (snmf, pca); and identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (lfmm), and controlling the false discovery rate. LEA is mainly based on optimized C programs that can scale with the dimension of very large data sets.

immunoClust immunoClust - Automated Pipeline for Population Detection in Flow Cytometry

Model based clustering and meta-clustering of Flow Cytometry Data

diggit Inference of Genetic Variants Driving Cellular Phenotypes

Inference of Genetic Variants Driving Cellullar Phenotypes by the DIGGIT algorithm

canceR A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC.

The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).

muscle Multiple Sequence Alignment with MUSCLE

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

BrowserViz BrowserViz: interactive R/browser graphics using websockets and JSON

Interactvive graphics in a web browser from R, using websockets and JSON

Rhtslib HTSlib high-throughput sequencing library as an R package

This package provides version 1.1 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").

skewr Visualize Intensities Produced by Illumina's Human Methylation 450k BeadChip

The skewr package is a tool for visualizing the output of the Illumina Human Methylation 450k BeadChip to aid in quality control. It creates a panel of nine plots. Six of the plots represent the density of either the methylated intensity or the unmethylated intensity given by one of three subsets of the 485,577 total probes. These subsets include Type I-red, Type I-green, and Type II.The remaining three distributions give the density of the Beta-values for these same three subsets. Each of the nine plots optionally displays the distributions of the "rs" SNP probes and the probes associated with imprinted genes as series of 'tick' marks located above the x-axis.

sigsquared Gene signature generation for functionally validated signaling pathways

By leveraging statistical properties (log-rank test for survival) of patient cohorts defined by binary thresholds, poor-prognosis patients are identified by the sigsquared package via optimization over a cost function reducing type I and II error.

SELEX Functions for analyzing SELEX-seq data

Tools for quantifying DNA binding specificities based on SELEX-seq data

ProtGenerics S4 generic functions for Bioconductor proteomics infrastructure

S4 generic functions needed by Bioconductor proteomics packages.

BubbleTree A method to elucidate purity and clonality in tumors using copy number ratio and allele frequency

BubbleTree utilizes homogenous pertinent somatic copy number alterations (SCNAs) as markers of tumor clones to extract estimates of tumor ploidy, purity and clonality.

rGREAT Client for GREAT Analysis

This package makes GREAT (Genomic Regions Enrichment of Annotations Tool) analysis automatic by constructing a HTTP POST request according to user's input and automatically retrieving results from GREAT web server.

birte Bayesian Inference of Regulatory Influence on Expression (biRte)

Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. biRte uses regulatory networks of TFs, miRNAs and possibly other factors, together with mRNA, miRNA and other available expression data to predict the relative influence of a regulator on the expression of its target genes. Inference is done in a Bayesian modeling framework using Markov-Chain-Monte-Carlo. A special feature is the possibility for follow-up network reverse engineering between active regulators.

HIBAG HLA Genotype Imputation with Attribute Bagging

It is a software package for imputing HLA types using SNP data, and relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

sincell R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data

Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.

Cardinal A mass spectrometry imaging toolbox for statistical analysis

Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.

GreyListChIP Grey Lists -- Mask Artefact Regions Based on ChIP Inputs

Identify regions of ChIP experiments with high signal in the input, that lead to spurious peaks during peak calling. Remove reads aligning to these regions prior to peak calling, for cleaner ChIP analysis.

IVAS Identification of genetic Variants affecting Alternative Splicing

Identification of genetic variants affecting alternative splicing.

cytofkit cytofkit: an integrated analysis pipeline for mass cytometry data

An integrated mass cytometry data analysis pipeline that enables simultaneous illustration of cellular diversity and progression.

seq2pathway a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data

Seq2pathway is a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data, consisting of "seq2gene" and "gene2path" components. The seq2gene links sequence-level measurements of genomic regions (including SNPs or point mutation coordinates) to gene-level scores, and the gene2pathway summarizes gene scores to pathway-scores for each sample. The seq2gene has the feasibility to assign both coding and non-exon regions to a broader range of neighboring genes than only the nearest one, thus facilitating the study of functional non-coding regions. The gene2pathway takes into account the quantity of significance for gene members within a pathway compared those outside a pathway. The output of seq2pathway is a general structure of quantitative pathway-level scores, thus allowing one to functional interpret such datasets as RNA-seq, ChIP-seq, GWAS, and derived from other next generational sequencing experiments.

ggtree a phylogenetic tree viewer for different types of tree annotations

ggtree extends the ggplot2 plotting system which implemented the grammar of graphics. ggtree is designed for visualizing phylogenetic tree and different types of associated annotation data.

parglms support for parallelized estimation of GLMs/GEEs

support for parallelized estimation of GLMs/GEEs, catering for dispersed data

seqPattern Visualising oligonucleotide patterns and motif occurrences across a set of sorted sequences

Visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature.

MeSHSim MeSH(Medical Subject Headings) Semantic Similarity Measures

Provide for measuring semantic similarity over MeSH headings and MEDLINE documents

mAPKL A Hybrid Feature Selection method for gene expression data

We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes.

gdsfmt R Interface to CoreArray Genomic Data Structure (GDS) Files

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms and include hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers with less than 8 bits, since a single genetic/genomic variant, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are also supported with relatively efficient random access. It is allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

TRONCO TRONCO, a package for TRanslational ONCOlogy

Genotype-level cancer progression models describe the ordering of accumulating mutations, e.g., somatic mutations / copy number variations, during cancer development. These graphical models help understand the causal structure involving events promoting cancer progression, possibly predicting complex patterns characterising genomic progression of a cancer. Reconstructed models can be used to better characterise genotype-phenotype relation, and suggest novel targets for therapy design. TRONCO (TRanslational ONCOlogy) is a R package aimed at collecting state-of-the-art algorithms to infer progression models from cross-sectional data, i.e., data collected from independent patients which does not necessarily incorporate any evident temporal information. These algorithms require a binary input matrix where: (i) each row represents a patient genome, (ii) each column an event relevant to the progression (a priori selected) and a 0/1 value models the absence/presence of a certain mutation in a certain patient. The current first version of TRONCO implements the CAPRESE algorithm (Cancer PRogression Extraction with Single Edges) to infer possible progression models arranged as trees; cfr. Inferring tree causal models of cancer progression with probability raising, L. Olde Loohuis, G. Caravagna, A. Graudenzi, D. Ramazzotti, G. Mauri, M. Antoniotti and B. Mishra. PLoS One, to appear. This vignette shows how to use TRONCO to infer a tree model of ovarian cancer progression from CGH data of copy number alterations (classified as gains or losses over chromosome's arms). The dataset used is available in the SKY/M-FISH database.

RnaSeqSampleSize RnaSeqSampleSize

RnaSeqSampleSize package provides a sample size calculation method based on negative binomial model and the exact test for assessing differential expression analysis of RNA-seq data

gespeR Gene-Specific Phenotype EstimatoR

Estimates gene-specific phenotypes from off-target confounded RNAi screens. The phenotype of each siRNA is modeled based on on-targeted and off-targeted genes, using a regularized linear regression model.

coMET coMET: visualisation of regional epigenome-wide association scan (EWAS) results and DNA co-methylation patterns.

Visualisation of EWAS results in a genomic region. In addition to phenotype-association P-values, coMET also generates plots of co-methylation patterns and provides a series of annotation tracks. It can be used to other omic-wide association scans as long as the data can be translated to genomic level and for any species.

CODEX A Normalization and Copy Number Variation Detection Method for Whole Exome Sequencing

A normalization and copy number variation calling procedure for whole exome DNA sequencing data. CODEX relies on the availability of multiple samples processed using the same sequencing pipeline for normalization, and does not require matched controls. The normalization model in CODEX includes terms that specifically remove biases due to GC content, exon length and targeting and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data.

chromDraw chromDraw an R package for visualization of linear and circular karyotypes.

Package chromDraw is a simple package for linear and circular type of karyotype visualization. The linear type of visualization is usually used for demonstrating chromosomes structures in karyotype and the circular type of visualization is used for comparing of karyotypes between each other. This tool has own input data format or genomicRanges structure can be used as input. Each chromosome containing definition of blocks and centromere position. Output file formats are *.eps and *.svg.

AnalysisPageServer A framework for sharing interactive data and plots from R through the web.

AnalysisPageServer is a modular system that enables sharing of customizable R analyses via the web.

rgsepd Gene Set Enrichment / Projection Displays

R/GSEPD is a bioinformatics package for R to help disambiguate transcriptome samples (a matrix of RNA-Seq counts at RefSeq IDs) by automating differential expression (with DESeq2), then gene set enrichment (with GOSeq), and finally a N-dimensional projection to quantify in which ways each sample is like either treatment group.

mdgsa Multi Dimensional Gene Set Analysis.

Functions to preform a Gene Set Analysis in several genomic dimensions. Including methods for miRNAs.

FlowSOM Using self-organizing maps for visualization and interpretation of cytometry data

FlowSOM offers visualization options for cytometry data, by using Self-Organizing Map clustering and Minimal Spanning Trees

gQTLstats gQTLstats: computationally efficient analysis for eQTL and allied studies

computationally efficient analysis of eQTL, mQTL, dsQTL, etc.

gQTLBase gQTLBase: infrastructure for eQTL, mQTL and similar studies

Infrastructure for eQTL, mQTL and similar studies.

PROPER PROspective Power Evaluation for RNAseq

This package provide simulation based methods for evaluating the statistical power in differential expression analysis from RNA-seq data.

nethet A bioconductor package for high-dimensional exploration of biological network heterogeneity

Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).

cpvSNP Gene set analysis methods for SNP association p-values that lie in genes in given gene sets

Gene set analysis methods exist to combine SNP-level association p-values into gene sets, calculating a single association p-value for each gene set. This package implements two such methods that require only the calculated SNP p-values, the gene set(s) of interest, and a correlation matrix (if desired). One method (GLOSSI) requires independent SNPs and the other (VEGAS) can take into account correlation (LD) among the SNPs. Built-in plotting functions are available to help users visualize results.

QuartPAC Identification of mutational clusters in protein quaternary structures.

Identifies clustering of somatic mutations in proteins over the entire quaternary structure.

saps Significance Analysis of Prognostic Signatures

Functions implementing the Significance Analysis of Prognostic Signatures method (SAPS). SAPS provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. P_pure is calculated as the probability of no survival difference between the two groups. Next, the same procedure is applied to randomly generated gene sets, and P_random is calculated as the proportion achieving a P_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and P_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance. A SAPS_score is calculated to summarize the three statistics, and optionally a Q-value is computed to estimate the significance of the SAPS_score by calculating SAPS_scores for random gene sets.

genomation Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

AIMS AIMS : Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

Description: This package contains the AIMS implementation. It contains necessary functions to assign the five intrinsic molecular subtypes (Luminal A, Luminal B, Her2-enriched, Basal-like, Normal-like). Assignments could be done on individual samples as well as on dataset of gene expression data.

Metab Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS.

Metab is an R package for high-throughput processing of metabolomics data analysed by the Automated Mass Spectral Deconvolution and Identification System (AMDIS) ( In addition, it performs statistical hypothesis test (t-test) and analysis of variance (ANOVA). Doing so, Metab considerably speed up the data mining process in metabolomics and produces better quality results. Metab was developed using interactive features, allowing users with lack of R knowledge to appreciate its functionalities.

pepXMLTab Parsing pepXML files and filter based on peptide FDR.

Parsing pepXML files based one XML package. The package tries to handle pepXML files generated from different softwares. The output will be a peptide-spectrum-matching tabular file. The package also provide function to filter the PSMs based on FDR.

seqTools Analysis of nucleotide, sequence and quality content on fastq files.

Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.

mygene Access MyGene.Info_ services

MyGene.Info_ provides simple-to-use REST web services to query/retrieve gene annotation data. It's designed with simplicity and performance emphasized. *mygene*, is an easy-to-use R wrapper to access MyGene.Info_ services.

EBSeqHMM Bayesian analysis for identifying gene or isoform expression changes in ordered RNA-seq experiments

The EBSeqHMM package implements an auto-regressive hidden Markov model for statistical analysis in ordered RNA-seq experiments (e.g. time course or spatial course data). The EBSeqHMM package provides functions to identify genes and isoforms that have non-constant expression profile over the time points/positions, and cluster them into expression paths.

specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics

specL provides a function for generating spectra libraries which can be used for MRM SRM MS workflows in proteomics. The package provides a BiblioSpec reader, a function which can add the protein information using a FASTA formatted amino acid file, and an export method for using the created library in the Spectronaut software.

COSNet COSNet: Cost-Senstitive network for node label prediction

Package that implements the COSNet classification algorithm. The algorithm predicts node labels in partially labeled graphs.

CFAssay Statistical analysis for the Colony Formation Assay

The package provides functions for calculation of linear-quadratic cell survival curves and for ANOVA of experimental 2-way designs along with the colony formation assay.

Rqc Quality Control Tool for High-Throughput Sequencing Data

Rqc is an optimised tool designed for quality control and assessment of high-throughput sequencing data. It performs parallel processing of entire files and produces a report which contains a set of high-resolution graphics.

MBAmethyl Model-based analysis of DNA methylation data

This package provides a function for reconstructing DNA methylation values from raw measurements. It iteratively implements the group fused lars to smooth related-by-location methylation values and the constrained least squares to remove probe affinity effect across multiple sequences.

BridgeDbR Code for using BridgeDb identifier mapping framework from within R

Use BridgeDb functions and load identifier mapping databases in R

tracktables Build IGV tracks and HTML reports

Methods to create complex IGV genome browser sessions and dynamic IGV reports in HTML pages.

ToPASeq Package for Topology-based Pathway Analysis of RNASeq data

Implementation of seven methods for topology-based pathway analysis of both RNASeq and microarray data: SPIA, DEGraph, TopologyGSA, TAPPA, TBS, PWEA and a visualization tool for a single pathway.

kebabs Kernel-Based Analysis Of Biological Sequences

The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.

IMPCdata Retrieves data from IMPC database

Package contains methods for data retrieval from IMPC Database.

facopy Feature-based association and gene-set enrichment for copy number alteration analysis in cancer

facopy is an R package for fine-tuned cancer CNA association modeling. Association is measured directly at the genomic features of interest and, in the case of genes, downstream gene-set enrichment analysis can be performed thanks to novel internal processing of the data. The software opens a way to systematically scrutinize the differences in CNA distribution across tumoral phenotypes, such as those that relate to tumor type, location and progression. Currently, the output format from 11 different methods that analyze data from whole-genome/exome sequencing and SNP microarrays, is supported. Multiple genomes, alteration types and variable types are also supported.

groHMM GRO-seq Analysis Pipeline.

A pipeline for the analysis of GRO-seq data.

mvGST Multivariate and directional gene set testing

mvGST provides platform-independent tools to identify GO terms (gene sets) that are differentially active (up or down) in multiple contrasts of interest. Given a matrix of one-sided p-values (rows for genes, columns for contrasts), mvGST uses meta-analytic methods to combine p-values for all genes annotated to each gene set, and then classify each gene set as being significantly more active (1), less active (-1), or not significantly differentially active (0) in each contrast of interest. With multiple contrasts of interest, each gene set is assigned to a profile (across contrasts) of differential activity. Tools are also provided for visualizing (in a GO graph) the gene sets classified to a given profile.

CoRegNet CoRegNet : reconstruction and integrated analysis of co-regulatory networks

This package provides methods to identify active transcriptional programs. Methods and classes are provided to import or infer large scale co-regulatory network from transcriptomic data. The specificity of the encoded networks is to model Transcription Factor cooperation. External regulation evidences (TFBS, ChIP,...) can be integrated to assess the inferred network and refine it if necessary. Transcriptional activity of the regulators in the network can be estimated using an measure of their influence in a given sample. Finally, an interactive UI can be used to navigate through the network of cooperative regulators and to visualize their activity in a specific sample or subgroup sample. The proposed visualization tool can be used to integrate gene expression, transcriptional activity, copy number status, sample classification and a transcriptional network including co-regulation information.

SGSeq Prediction, quantification and visualization of splice events from RNA-seq data

RNA-seq data are informative for the analysis of known and novel transcript isoforms. While the short length of RNA-seq reads limits the ability to predict and quantify full-length transcripts, short read data are well suited for the analysis of individual splice events (e.g. inclusion or skipping of a cassette exon). The SGSeq package enables the prediction, quantification and visualization of splice events from BAM files.

STATegRa Classes and methods for multi-omics data integration

Classes and tools for multi-omics data integration.

csaw ChIP-seq analysis with windows

Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.

SigCheck Check a gene signature's prognostic performance against random signatures, known signatures, and permuted data/metadata.

While gene signatures are frequently used to predict phenotypes (e.g. predict prognosis of cancer patients), it it not always clear how optimal or meaningful they are (cf David Venet, Jacques E. Dumont, and Vincent Detours' paper "Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome"). Based on suggestions in that paper, SigCheck accepts a data set (as an ExpressionSet) and a gene signature, and compares its performance on survival and/or classification tasks against a) random gene signatures of the same length; b) known, related and unrelated gene signatures; and c) permuted data and/or metadata.

flowCHIC Analyze flow cytometric data using histogram information

A package to analyze flow cytometric data of complex microbial communities based on histogram images

MAIT Statistical Analysis of Metabolomic Data

The MAIT package contains functions to perform end-to-end statistical analysis of LC/MS Metabolomic Data. Special emphasis is put on peak annotation and in modular function design of the functions.

GenomicTuples Representation and Manipulation of Genomic Tuples

GenomicTuples defines general purpose containers for storing genomic tuples. It aims to provide functionality for tuples of genomic co-ordinates that are analogous to those available for genomic ranges in the GenomicRanges Bioconductor package.

focalCall Detection of focal aberrations in DNA copy number data

Detection of genomic focal aberrations in high-resolution DNA copy number data

flowcatchR Tools to analyze in vivo microscopy imaging data focused on tracking flowing blood cells

flowcatchR is a set of tools to analyze in vivo microscopy imaging data, focused on tracking flowing blood cells. It guides the steps from segmentation to calculation of features, filtering out particles not of interest, providing also a set of utilities to help checking the quality of the performed operations (e.g. how good the segmentation was). It allows investigating the issue of tracking flowing cells such as in blood vessels, to categorize the particles in flowing, rolling and adherent. This classification is applied in the study of phenomena such as hemostasis and study of thrombosis development. Moreover, flowcatchR presents an integrated workflow solution, based on the integration with a Shiny App and Jupyter notebooks, which is delivered alongside the package, and can enable fully reproducible bioimage analysis in the R environment.

paxtoolsr PaxtoolsR: Access Pathways from Multiple Databases through BioPAX and Pathway Commons

The package provides a set of R functions for interacting with BioPAX OWL files using Paxtools and the querying Pathway Commons (PC) molecular interaction database that are hosted by the Computational Biology Center at Memorial Sloan-Kettering Cancer Center (MSKCC). Pathway Commons databases include: BIND, BioGRID, CORUM, CTD, DIP, DrugBank, HPRD, HumanCyc, IntAct, KEGG, MirTarBase, Panther, PhosphoSitePlus, Reactome, RECON, TRANSFAC.

PAA PAA (Protein Array Analyzer)

PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After pre-processing (background correction, batch filtering, normalization) univariate feature pre-selection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.

ASGSCA Association Studies for multiple SNPs and multiple traits using Generalized Structured Equation Models

The package provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. Genes, and clinical pathways are incorporated in the model as latent variables. The method is based on Generalized Structured Component Analysis (GSCA).

proBAMr Generating SAM file for PSMs in shotgun proteomics data.

Mapping PSMs back to genome. The package builds SAM file from shotgun proteomics data The package also provides function to prepare annotation from GTF file.

IdeoViz Plots data (continuous/discrete) along chromosomal ideogram

Plots data associated with arbitrary genomic intervals along chromosomal ideogram.

MSGFgui A shiny GUI for MSGFplus

This package makes it possible to perform analyses using the MSGFplus package in a GUI environment. Furthermore it enables the user to investigate the results using interactive plots, summary statistics and filtering. Lastly it exposes the current results to another R session so the user can seamlessly integrate the gui into other workflows.

FEM Identification of Functional Epigenetic Modules

The FEM package performs a systems-level integrative analysis of DNA methylation and gene expression data. It seeks modules of functionally related genes which exhibit differential promoter DNA methylation and differential expression, where an inverse association between promoter DNA methylation and gene expression is assumed. For full details, see Jiao et al Bioinformatics 2014.

regionReport Generate HTML reports for exploring a set of regions

Generate HTML reports to explore a set of regions such as the results from annotation-agnostic expression analysis of RNA-seq data at base-pair resolution performed by derfinder.

derfinderHelper derfinder helper package

Helper package for speeding up the derfinder package when using multiple cores.

derfinderPlot Plotting functions for derfinder

Plotting functions for derfinder

derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution

Annotation-agnostic differential expression analysis of RNA-seq data by calculating F-statistics at base-pair resolution

MSGFplus An interface between R and MS-GF+

This package contains function to perform peptide identification using MS-GF+

polyester Simulate RNA-seq reads

This package can be used to simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.

ALDEx2 Analysis of differential abundance taking sample variation into account

A differential abundance analysis for the comparison of two or more conditions. For example, single-organism and meta-RNA-seq high-throughput sequencing assays, or of selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, that has been optimized for three or more experimental replicates. Infers sampling variation and calculates the expected false discovery rate given the biological and sampling variation using the Wilcox rank test or Welches t-test (aldex.ttest) or the glm and Kruskal Wallis tests (aldex.glm). Reports both P and fdr values calculated by the Benjamini Hochberg correction.

PSEA Population-Specific Expression Analysis.

Deconvolution of gene expression data by Population-Specific Expression Analysis (PSEA).

GenomicInteractions R package for handling genomic interaction data

R package for handling Genomic interaction data, such as ChIA-PET/Hi-C, annotating genomic features with interaction information and producing various plots / statistics

rRDP Interface to the RDP Classifier

Seamlessly interfaces RDP classifier (version 2.9).

MGFM Marker Gene Finder in Microarray gene expression data

The package is designed to detect marker genes from Microarray gene expression data sets

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:


Development Version »

Bioconductor packages under development:

Developer Resources:

Fred Hutchinson Cancer Research Center