TRONCO TRONCO, a package for TRanslational ONCOlogy
Genotype-level cancer progression models describe the ordering of accumulating mutations, e.g., somatic mutations / copy number variations, during cancer development. These graphical models help understand the causal structure involving events promoting cancer progression, possibly predicting complex patterns characterising genomic progression of a cancer. Reconstructed models can be used to better characterise genotype-phenotype relation, and suggest novel targets for therapy design. TRONCO (TRanslational ONCOlogy) is a R package aimed at collecting state-of-the-art algorithms to infer progression models from cross-sectional data, i.e., data collected from independent patients which does not necessarily incorporate any evident temporal information. These algorithms require a binary input matrix where: (i) each row represents a patient genome, (ii) each column an event relevant to the progression (a priori selected) and a 0/1 value models the absence/presence of a certain mutation in a certain patient. The current first version of TRONCO implements the CAPRESE algorithm (Cancer PRogression Extraction with Single Edges) to infer possible progression models arranged as trees; cfr. Inferring tree causal models of cancer progression with probability raising, L. Olde Loohuis, G. Caravagna, A. Graudenzi, D. Ramazzotti, G. Mauri, M. Antoniotti and B. Mishra. PLoS One, to appear. This vignette shows how to use TRONCO to infer a tree model of ovarian cancer progression from CGH data of copy number alterations (classified as gains or losses over chromosome's arms). The dataset used is available in the SKY/M-FISH database.
RnaSeqSampleSize package provides a sample size calculation method based on negative binomial model and the exact test for assessing differential expression analysis of RNA-seq data
gespeR Gene-Specific Phenotype EstimatoR
Estimates gene-specific phenotypes from off-target confounded RNAi screens. The phenotype of each siRNA is modeled based on on-targeted and off-targeted genes, using a regularized linear regression model.
CODEX A Normalization and Copy Number Variation Detection Method for Whole Exome Sequencing
A normalization and copy number variation calling procedure for whole exome DNA sequencing data. CODEX relies on the availability of multiple samples processed using the same sequencing pipeline for normalization, and does not require matched controls. The normalization model in CODEX includes terms that specifically remove biases due to GC content, exon length and targeting and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data.
chromDraw chromDraw is a simple package for drawing simple karyotype(s) or comparing karyotypes shemes.
chromDraw is a simple package for drawing simple karyotype(s) or comparing karyotypes schemes (linear and circular). This tool has own input data format. Each chromosome containing definition of blocks and centromere position. Output file formats are *.eps and *.svg.
AnalysisPageServer A framework for sharing interactive data and plots from R through the web.
AnalysisPageServer is a modular system that enables sharing of customizable R analyses via the web.
rgsepd Gene Set Enrichment / Projection Displays
R/GSEPD is a bioinformatics package for R to help disambiguate transcriptome samples (a matrix of RNA-Seq counts at RefSeq IDs) by automating differential expression (with DESeq2), then gene set enrichment (with GOSeq), and finally a N-dimensional projection to quantify in which ways each sample is like either treatment group.
mdgsa Multi Dimensional Gene Set Analysis.
Functions to preform a Gene Set Analysis in several genomic dimensions. Including methods for miRNAs.
FlowSOM Using self-organizing maps for visualization and interpretation of cytometry data
FlowSOM offers visualization options for cytometry data, by using Self-Organizing Map clustering and Minimal Spanning Trees
gQTLstats gQTLstats: computationally efficient analysis for eQTL and allied studies
computationally efficient analysis of eQTL, mQTL, dsQTL, etc.
gQTLBase gQTLBase: infrastructure for eQTL, mQTL and similar studies
infrastructure for eQTL, mQTL and similar studies
PROPER PROspective Power Evaluation for RNAseq
This package provide simulation based methods for evaluating the statistical power in differential expression analysis from RNA-seq data.
nethet A bioconductor package for high-dimensional exploration of biological network heterogeneity
Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).
cpvSNP Gene set analysis methods for SNP association p-values that lie in genes in given gene sets
Gene set analysis methods exist to combine SNP-level association p-values into gene sets, calculating a single association p-value for each gene set. This package implements two such methods that require only the calculated SNP p-values, the gene set(s) of interest, and a correlation matrix (if desired). One method (GLOSSI) requires independent SNPs and the other (VEGAS) can take into account correlation (LD) among the SNPs. Built-in plotting functions are available to help users visualize results.
QuartPAC Identification of mutational clusters in protein quaternary structures.
Identifies clustering of somatic mutations in proteins over the entire quaternary structure.
saps Significance Analysis of Prognostic Signatures
Functions implementing the Significance Analysis of Prognostic Signatures method (SAPS). SAPS provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. P_pure is calculated as the probability of no survival difference between the two groups. Next, the same procedure is applied to randomly generated gene sets, and P_random is calculated as the proportion achieving a P_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and P_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance. A SAPS_score is calculated to summarize the three statistics, and optionally a Q-value is computed to estimate the significance of the SAPS_score by calculating SAPS_scores for random gene sets.
genomation Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
AIMS AIMS : Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype
Description: This package contains the AIMS implementation. It contains necessary functions to assign the five intrinsic molecular subtypes (Luminal A, Luminal B, Her2-enriched, Basal-like, Normal-like). Assignments could be done on individual samples as well as on dataset of gene expression data.
Metab Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS.
Metab is an R package for high-throughput processing of metabolomics data analysed by the Automated Mass Spectral Deconvolution and Identification System (AMDIS) (http://chemdata.nist.gov/mass-spc/amdis/downloads/). In addition, it performs statistical hypothesis test (t-test) and analysis of variance (ANOVA). Doing so, Metab considerably speed up the data mining process in metabolomics and produces better quality results. Metab was developed using interactive features, allowing users with lack of R knowledge to appreciate its functionalities.
pepXMLTab Parsing pepXML files and filter based on peptide FDR.
Parsing pepXML files based one XML package. The package tries to handle pepXML files generated from different softwares. The output will be a peptide-spectrum-matching tabular file. The package also provide function to filter the PSMs based on FDR.
seqTools Analysis of nucleotide, sequence and quality content on fastq files.
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
mygene Access MyGene.Info_ services
MyGene.Info_ provides simple-to-use REST web services to query/retrieve gene annotation data. It's designed with simplicity and performance emphasized. *mygene*, is an easy-to-use R wrapper to access MyGene.Info_ services.
EBSeqHMM Bayesian analysis for identifying gene or isoform expression changes in ordered RNA-seq experiments
The EBSeqHMM package implements an auto-regressive hidden Markov model for statistical analysis in ordered RNA-seq experiments (e.g. time course or spatial course data). The EBSeqHMM package provides functions to identify genes and isoforms that have non-constant expression profile over the time points/positions, and cluster them into expression paths.
specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics
specL provides a function for generating spectra libraries which can be used for MRM SRM MS workflows in proteomics. The package provides a BiblioSpec reader, a function which can add the protein information using a FASTA formatted amino acid file, and an export method for using the created library in the Spectronaut software.
COSNet COSNet: Cost-Senstitive network for node label prediction
Package that implements the COSNet classification algorithm. The algorithm predicts node labels in partially labeled graphs.
CFAssay Statistical analysis for the Colony Formation Assay
The package provides functions for calculation of linear-quadratic cell survival curves and for ANOVA of experimental 2-way designs along with the colony formation assay.
Rqc Quality Control Tool for High-Throughput Sequencing Data
Rqc is an optimised tool designed for quality control and assessment of high-throughput sequencing data. It performs parallel processing of entire files and produces a report which contains a set of high-resolution graphics.
MBAmethyl Model-based analysis of DNA methylation data
This package provides a function for reconstructing DNA methylation values from raw measurements. It iteratively implements the group fused lars to smooth related-by-location methylation values and the constrained least squares to remove probe affinity effect across multiple sequences.
BridgeDbR Code for using BridgeDb identifier mapping framework from within R
Use BridgeDb functions and load identifier mapping databases in R
tracktables Build IGV tracks and HTML reports
Methods to create complex IGV genome browser sessions and dynamic IGV reports in HTML pages.
ToPASeq Package for Topology-based Pathway Analysis of RNASeq data
Implementation of seven methods for topology-based pathway analysis of both RNASeq and microarray data: SPIA, DEGraph, TopologyGSA, TAPPA, TBS, PWEA and a visualization tool for a single pathway.
kebabs Kernel-Based Analysis Of Biological Sequences
The package provides functionality for kernel-based analysis of DNA, RNA, and amino acid sequences via SVM-based methods. As core functionality, kebabs implements following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel, and motif kernel. Apart from an efficient implementation of standard position-independent functionality, the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation, other kernels like the weighted degree kernel or the shifted weighted degree kernel with constant weighting of positions are included as special cases. An annotation-specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows for the generation of a kernel matrix or an explicit feature representation in dense or sparse format for all available kernels which can be used with methods implemented in other R packages. With focus on SVM-based methods, kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071, and LiblineaR. Binary and multi-class classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters, and formats of the selected SVM. As support for choosing hyperparameters, the package provides cross validation - including grouped cross validation, grid search and model selection functions. For easier biological interpretation of the results, the package computes feature weights for all SVMs and prediction profiles which show the contribution of individual sequence positions to the prediction result and indicate the relevance of sequence sections for the learning result and the underlying biological functions.
IMPCdata Retrieves data from IMPC database
Package contains methods for data retrieval from IMPC Database.
facopy Feature-based association and gene-set enrichment for copy number alteration analysis in cancer
facopy is an R package for fine-tuned cancer CNA association modeling. Association is measured directly at the genomic features of interest and, in the case of genes, downstream gene-set enrichment analysis can be performed thanks to novel internal processing of the data. The software opens a way to systematically scrutinize the differences in CNA distribution across tumoral phenotypes, such as those that relate to tumor type, location and progression. Currently, the output format from 11 different methods that analyze data from whole-genome/exome sequencing and SNP microarrays, is supported. Multiple genomes, alteration types and variable types are also supported.
groHMM GRO-seq Analysis Pipeline.
A pipeline for the analysis of GRO-seq data.
mvGST Multivariate and directional gene set testing
mvGST provides platform-independent tools to identify GO terms (gene sets) that are differentially active (up or down) in multiple contrasts of interest. Given a matrix of one-sided p-values (rows for genes, columns for contrasts), mvGST uses meta-analytic methods to combine p-values for all genes annotated to each gene set, and then classify each gene set as being significantly more active (1), less active (-1), or not significantly differentially active (0) in each contrast of interest. With multiple contrasts of interest, each gene set is assigned to a profile (across contrasts) of differential activity. Tools are also provided for visualizing (in a GO graph) the gene sets classified to a given profile.
CoRegNet CoRegNet : reconstruction and integrated analysis of co-regulatory networks
This package provides methods to identify active transcriptional programs. Methods and classes are provided to import or infer large scale co-regulatory network from transcriptomic data. The specificity of the encoded networks is to model Transcription Factor cooperation. External regulation evidences (TFBS, ChIP,...) can be integrated to assess the inferred network and refine it if necessary. Transcriptional activity of the regulators in the network can be estimated using an measure of their influence in a given sample. Finally, an interactive UI can be used to navigate through the network of cooperative regulators and to visualize their activity in a specific sample or subgroup sample. The proposed visualization tool can be used to integrate gene expression, transcriptional activity, copy number status, sample classification and a transcriptional network including co-regulation information.
SGSeq Prediction, quantification and visualization of alternative transcript events from RNA-seq data
RNA-seq data are informative for the analysis of known and novel transcript isoforms. While the short length of RNA-seq reads limits the ability to predict and quantify full-length transcripts, short read data are well suited for the analysis of individual alternative transcripts events (e.g. inclusion or skipping of a cassette exon). The SGSeq package enables the prediction, quantification and visualization of alternative transcript events from BAM files.
STATegRa Classes and methods for multi-omics data integration
Classes and tools for multi-omics data integration.
csaw ChIP-seq analysis with windows
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
SigCheck Check a gene signature's classification performance against random signatures, permuted data, and known signatures.
While gene signatures are frequently used to classify data (e.g. predict prognosis of cancer patients), it it not always clear how optimal or meaningful they are (cf David Venet, Jacques E. Dumont, and Vincent Detours' paper "Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome"). Based partly on suggestions in that paper, SigCheck accepts a data set (as an ExpressionSet) and a gene signature, and compares its classification performance (using the MLInterfaces package) against a) random gene signatures of the same length; b) known, (related and unrelated) gene signatures; and c) permuted data.
flowCHIC Analyze flow cytometric data using histogram information
A package to analyze flow cytometric data of complex microbial communities based on histogram images
MAIT Statistical Analysis of Metabolomic Data
The MAIT package contains functions to perform end-to-end statistical analysis of LC/MS Metabolomic Data. Special emphasis is put on peak annotation and in modular function design of the functions.
GenomicTuples Representation and manipulation of genomic tuples
GenomicTuples defines general purpose containers for storing genomic tuples. It aims to provide functionality for tuples of genomic co-ordinates that are analogous to those available for genomic ranges in the GenomicRanges Bioconductor package.
focalCall Detection of focal aberrations in DNA copy number data
Detection of genomic focal aberrations in high-resolution DNA copy number data
flowcatchR Tools to analyze in vivo microscopy imaging data focused on tracking flowing blood cells.
flowcatchR is a set of tools to analyze in vivo microscopy imaging data, focused on tracking flowing blood cells. It guides the steps from segmentation to calculation of features, filtering out particles not of interest, providing also a set of utilities to help checking the quality of the performed operations (e.g. how good the segmentation was). The main novel contribution investigates the issue of tracking flowing cells such as in blood vessels, to categorize the particles in flowing, rolling and adherent. This classification is applied in the study of phenomena such as hemostasis and study of thrombosis development.
paxtoolsr PaxtoolsR: Access Pathways from Multiple Databases through BioPAX and Pathway Commons
The package provides a basic set of R functions for interacting with BioPAX OWL files and the querying Pathway Commons (PC) molecular interaction data server, hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).
PAA PAA (Protein Array Analyzer)
PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After pre-processing (background correction, batch filtering, normalization) univariate feature pre-selection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.
ASGSCA Association Studies for multiple SNPs and multiple traits using Generalized Structured Equation Models
The package provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. Genes, and clinical pathways are incorporated in the model as latent variables. The method is based on Generalized Structured Component Analysis (GSCA).
proBAMr Generating SAM file for PSMs in shotgun proteomics data.
Mapping PSMs back to genome. The package builds SAM file from shotgun proteomics data The package also provides function to prepare annotation from GTF file.
IdeoViz Plots data (continuous/discrete) along chromosomal ideogram
Plots data associated with arbitrary genomic intervals along chromosomal ideogram.
MSGFgui A shiny GUI for MSGFplus
This package makes it possible to perform analyses using the MSGFplus package in a GUI environment. Furthermore it enables the user to investigate the results using interactive plots, summary statistics and filtering. Lastly it exposes the current results to another R session so the user can seamlessly integrate the gui into other workflows.
FEM Identification of FunctionalEpigenetic Modules
FEM can dentify interactome hotspots of differential promoter methylation and differential ex-pression, where an inverse association between promoter methylation and gene expression is assumed.
regionReport Generate HTML reports for exploring a set of regions
Generate HTML reports to explore a set of regions such as the results from annotation-agnostic expression analysis of RNA-seq data at base-pair resolution performed by derfinder.
derfinderHelper derfinder helper package
Helper package for speeding up the derfinder package when using multiple cores.
derfinderPlot Plotting functions for derfinder
Plotting functions for derfinder
derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution
Annotation-agnostic differential expression analysis of RNA-seq data by calculating F-statistics at base-pair resolution
MSGFplus An interface between R and MS-GF+
This package contains function to perform peptide identification using MS-GF+
polyester Simulate RNA-seq reads
This package can be used to simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.
ALDEx2 Analysis of differential abundance taking sample variation into account
A differential abundance analysis for the comparison of two or more conditions. For example, single-organism and meta-RNA-seq high-throughput sequencing assays, or of selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, that has been optimized for three or more experimental replicates. Infers sampling variation and calculates the expected false discovery rate given the biological and sampling variation using the Wilcox rank test or Welches t-test (aldex.ttest) or the glm and Kruskal Wallis tests (aldex.glm). Reports both P and fdr values calculated by the Benjamini Hochberg correction.
PSEA Population-Specific Expression Analysis.
Deconvolution of gene expression data by Population-Specific Expression Analysis (PSEA).
GenomicInteractions R package for handling genomic interaction data
R package for handling Genomic interaction data, such as ChIA-PET/Hi-C, annotating genomic features with interaction information and producing various plots / statistics
rRDP Interface to the RDP Classifier
Seamlessly interfaces RDP classifier (version 2.9).
MGFM Marker Gene Finder in Microarray gene expression data
The package is designed to detect marker genes from Microarray gene expression data sets
simulatorZ Simulator for Collections of Independent Genomic Data Sets
simulatorZ is a package intended primarily to simulate collections of independent genomic data sets, as well as performing training and validation with predicting algorithms. It supports ExpressionSets and SummarizedExperiment objects.
MSnID Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications.
Extracts MS/MS ID data from mzIdentML (leveraging mzID package) or text files. After collating the search results from multiple datasets it assesses their identification quality and optimize filtering criteria to achieve the maximum number of identifications while not exceeding a specified false discovery rate. Also contains a number of utilities to explore the MS/MS results and assess missed and irregular enzymatic cleavages, mass measurement accuracy, etc.
GOexpress Visualise microarray and RNAseq data using gene ontology annotations
The package contains methods to visualise the expression profile of genes from a microarray or RNA-seq experiment and offers a clustering analysis to identify GO terms enriched in genes with expression levels best clustering two or more predefined groups of samples. Annotations for the genes present in the expression dataset are obtained from Ensembl through the biomaRt package. The random forest framework is used to evaluate the ability of each gene to cluster samples according to the factor of interest. Finally, GO terms are scored by averaging the rank (alternatively, score) of their respective gene sets to cluster the samples. An ANOVA approach is also available as an alternative statistical framework.
EnrichmentBrowser Seamless navigation through combined results of set-based and network-based enrichment analysis
The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. The analysis combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways.
systemPipeR systemPipeR: NGS workflow and report generation environment
R package for building end-to-end analysis pipelines with automated report generation for next generation sequence (NGS) applications such as RNA-Seq, ChIP-Seq, VAR-Seq and many others. An important feature is support for running command-line software, such as NGS aligners, on both single machines or compute clusters. This includes both interactive job submissions or batch submissions to queuing systems of clusters.
seqplots SeqPlots - An interactive tool for visualizing NGS signals and sequence motif densities along genomic features using average plots and heatmaps.
SeqPlots is a tool for plotting next generation sequencing (NGS) based experiments' signal tracks, e.g. reads coverage from ChIP-seq, RNA-seq and DNA accessibility assays like DNase-seq and MNase-seq, over user specified genomic features, e.g. promoters, gene bodies, etc. It can also calculate sequence motif density profiles from reference genome. The data are visualized as average signal profile plot, with error estimates (standard error and 95% confidence interval) shown as fields, or as series of heatmaps that can be sorted and clustered using hierarchical clustering, k-means algorithm and self organising maps. Plots can be prepared using R programming language or web browser based graphical user interface (GUI) implemented using Shiny framework. The dual-purpose implementation allows running the software locally on desktop or deploying it on server. SeqPlots is useful for both for exploratory data analyses and preparing replicable, publication quality plots. Other features of the software include collaboration and data sharing capabilities, as well as ability to store pre-calculated result matrixes, that combine many sequencing experiments and in-silico generated tracks with multiple different features. These binaries can be further used to generate new combination plots on fly, run automated batch operations or share with colleagues, who can adjust their plotting parameters without loading actual tracks and recalculating numeric values. SeqPlots relays on Bioconductor packages, mainly on rtracklayer for data input and BSgenome packages for reference genome sequence and annotations.
geecc Gene set Enrichment analysis Extended to Contingency Cubes
Use log-linear models to perform hypergeometric and chi-squared tests for gene set enrichments for two (based on contingency tables) or three categories (contingency cubes). Categories can be differentially expressed genes, GO terms, sequence length, GC content, chromosmal position, phylostrata, ....
GOsummaries Word cloud summaries of GO enrichment analysis
A package to visualise Gene Ontology (GO) enrichment analysis results on gene lists arising from different analyses such clustering or PCA. The significant GO categories are visualised as word clouds that can be combined with different plots summarising the underlying data.
TSCAN TSCAN: Tools for Single-Cell ANalysis
TSCAN enables users to easily construct and tune pseudotemporal cell ordering as well as analyzing differentially expressed genes. TSCAN comes with a user-friendly GUI written in shiny. More features will come in the future.
rain Rhythmicity Analysis Incorporating Non-parametric Methods
This package uses non-parametric methods to detect rhythms in time series. It deals with outliers, missing values and is optimized for time series comprising 10-100 measurements. As it does not assume expect any distinct waveform it is optimal or detecting oscillating behavior (e.g. circadian or cell cycle) in e.g. genome- or proteome-wide biological measurements such as: micro arrays, proteome mass spectrometry, or metabolome measurements.
NGScopy NGScopy: Detection of copy number variations in next generation sequencing
NGScopy provides a quantitative caller for detecting copy number variations in next generation sequencing (NGS), including whole genome sequencing (WGS), whole exome sequencing (WES) and targeted panel sequencing (TPS). The caller can be parallelized by chromosomes to use multiple processors/cores on one computer.
SNPRelate Parallel computing toolset for relatedness and principal component analysis of SNP data
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP format in this package is also being used by the GWASTools package with the support of S4 classes and generic functions.
GenoView Condensed, overlapped plotting of genomic data tracks
Superimposing input data over existing genomic references allows for fast, accurate visual comparisons. The GenoView package is a novel bioinformatics package which condenses genomic data tracks to offer a comprehensive view of genetic variants. Its main function is to display mutation data over exons and protein domains, which easily identifies potential genomic locations of interest.
FourCSeq Package analyse 4C sequencing data
FourCSeq is an R package dedicated to the analysis of (multiplexed) 4C sequencing data. The package provides a pipeline to detect specific interactions between DNA elements and identify differential interactions between conditions. The statistical analysis in R starts with individual bam files for each sample as inputs. To obtain these files, the package contains a python script (extdata/python/demultiplex.py) to demultiplex libraries and trim off primer sequences. With a standard alignment software the required bam files can be then be generated.
switchBox Utilities to train and validate classifiers based on pair switching using the K-Top-Scoring-Pair (KTSP) algorithm.
The package offer different classifiers based on comparisons of pair of features (TSP), using various decision rules (e.g., majority wins principle).
Rnits R Normalization and Inference of Time Series data
R/Bioconductor package for normalization, curve registration and inference in time course gene expression data
MethylMix MethylMix: Identifying methylation driven cancer genes.
MethylMix is an algorithm implemented to identify hyper and hypomethylated genes for a disease. MethylMix is based on a beta mixture model to identify methylation states and compares them with the normal DNA methylation state. MethylMix uses a novel statistic, the Differential Methylation value or DM-value defined as the difference of a methylation state with the normal methylation state. Finally, matched gene expression data is used to identify, besides differential, functional methylation states by focusing on methylation changes that effect gene expression.
methylPipe Base resolution DNA methylation data analysis
Memory efficient analysis of base resolution DNA methylation data in both the CpG and non-CpG sequence context. Integration of DNA methylation data derived from any methodology providing base- or low-resolution data.
MBASED Package containing functions for ASE analysis using Meta-analysis Based Allele-Specific Expression Detection
The package implements MBASED algorithm for detecting allele-specific gene expression from RNA count data, where allele counts at individual loci (SNVs) are integrated into a gene-specific measure of ASE, and utilizes simulations to appropriately assess the statistical significance of observed ASE.
hiReadsProcessor Functions to process LM-PCR reads from 454/Illumina data.
hiReadsProcessor contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.
erccdashboard Assess Differential Gene Expression Experiments with ERCC Controls
Technical performance metrics for differential gene expression experiments using External RNA Controls Consortium (ERCC) spike-in ratio mixtures.
compEpiTools Tools for computational epigenomics
Tools for computational epigenomics developed for the analysis, integration and simultaneous visualization of various (epi)genomics data types across multiple genomic regions in multiple samples.
quantro A test for when to use quantile normalization
A data-driven test for the assumptions of quantile normalization using raw data such as objects that inherit eSets (e.g. ExpressionSet, MethylSet). Group level information about each sample (such as Tumor / Normal status) must also be provided because the test assesses if there are global differences in the distributions between the user-defined groups.
SemDist Information Accretion-based Function Predictor Evaluation
This package implements methods to calculate information accretion for a given version of the gene ontology and uses this data to calculate remaining uncertainty, misinformation, and semantic similarity for given sets of predicted annotations and true annotations from a protein function predictor.
RUVnormalize RUV for normalization of expression array data
RUVnormalize is meant to remove unwanted variation from gene expression data when the factor of interest is not defined, e.g., to clean up a dataset for general use or to do any kind of unsupervised analysis.
MPFE Estimation of the amplicon methylation pattern distribution from bisulphite sequencing data.
Estimate distribution of methylation patterns from a table of counts from a bisulphite sequencing experiment given a non-conversion rate and read error rate.
miRNAtap miRNAtap: microRNA Targets - Aggregated Predictions
The package facilitates implementation of workflows requiring miRNA predictions, it allows to integrate ranked miRNA target predictions from multiple sources available online and aggregate them with various methods which improves quality of predictions above any of the single sources. Currently predictions are available for Homo sapiens, Mus musculus and Rattus norvegicus (the last one through homology translation).
M3D Identifies differentially methylated regions across testing groups.
This package identifies statistically significantly differentially methylated regions of CpGs. It uses kernel methods (the Maxmimum Mean Discrepancy) to measure differences in methylation profiles, and relates these to inter-replicate changes, whilst accounting for variation in coverage profiles.
ClassifyR A framework for two-class classification problems, with applications to differential variability and differential distribution testing.
The software formalises a framework for classification in R. There are four stages. Data transformation, feature selection, and prediction. The requirements of variable types and names are fixed, but specialised variables for functions can also be provided. The classification framework is wrapped in a driver loop, that reproducibly does a couple of cross-validation schemes. Functions for differential expression, differential variability, and differential distribution are included. Additional functions may be developed by the user, if they have better performing methods.
metagene A package to produce metagene plots
This package produces metagene plots to compare the behavior of DNA-interacting proteins at selected groups of genes/features. Pre-calculated features (such as transcription start sites of protein coding gene or enhancer) are available. Bam files are used to increase the resolution. Multiple combination of group of features and or group of bam files can be compared in a single analysis. Bootstraping analysis is used to compare the groups and locate regions with statistically different enrichment profiles.
metabomxtr A package to run mixture models for truncated metabolomics data with normal or lognormal distributions.
The functions in this package return optimized parameter estimates and log likelihoods for mixture models of truncated data with normal or lognormal distributions.
interactiveDisplayBase Base package for enabling powerful shiny web displays of Bioconductor objects
The interactiveDisplayBase package contains the the basic methods needed to generate interactive Shiny based display methods for Bioconductor objects.
STAN STrand-specific ANnotation of genomic data
STAN (STrand-specic ANnotation of genomic data) implements bidirectional Hidden Markov Models (bdHMM), which are designed for studying directed genomic processes, such as gene transcription, DNA replication, recombination or DNA repair by integrating genomic data. bdHMMs model a sequence of successive observations (e.g. ChIP or RNA measurements along the genome) by a discrete number of 'directed genomic states', which e.g. reflect distinct genome-associated complexes. Unlike standard HMM approaches, bdHMMs allow the integration of strand-specific (e.g. RNA) and non strand-specific data (e.g. ChIP).
ballgown Flexible, isoform-level differential expression analysis
Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.
missMethyl Analysis of methylation array data
Normalisation and testing for differential variability and differential methylation for data from Illumina's Infinium HumanMethylation450 array. The normalisation procedure is subset-quantile within-array normalisation (SWAN), which allows Infinium I and II type probes on a single array to be normalised together. The test for differential variability is based on an empirical Bayes version of Levene's test. Differential methylation testing is performed using RUV, which can adjust for systematic errors of unknown origin in high-dimensional data by using negative control probes.
OncoSimulR Simulation of cancer progresion with order restrictions
Functions for simulating and plotting cancer progression data, including drivers and passengers, and allowing for order restrictions. Simulations use continuous-time models (based on Bozic et al., 2010 and McFarland et al., 2013) and fitness functions account for possible restrictions in the order of accumulation of mutations.
Source Code & Build Reports »
Source code is stored in
Software packages are built and checked nightly. Build reports: