A mentored Bioconductor software development project is one in which experienced programmers work with volunteers to develop new capabilities needed by the community. More...
A mentored Bioconductor software development project is one in which experienced programmers work with volunteers to develop new capabilities needed by the community.
Developers new to Bioconductor may find mentored projects a useful way to apply, refine and extend their skills. Projects are identified by experienced Bioconductor developers. The projects involve important but manageable programming tasks. Experienced developers act as mentors, providing guidance and oversight. Successful mentored projects will be incorporated into the appropriate packages, and contributors will receive full credit for their work. Users, contributors and mentors will all benefit.
We anticipate that mentored projects will usually be run by one or two experienced Bioconductor-savvy programmers who provide guidance, usually remotely, to one or more less-experienced programmers. All the tools of 'social coding' -- from email and svn to github and skype -- can be used, at the discretion of the participants. Except in unusual circumstances, we expect that participants will have their own independent funding, most likely as the result of a good fit between the mentored project and their current employment or academic studies.
Below you will find a list of proposed projects. We invite your participation. We welcome your suggestions.
The mzR R/Bioconductor package provides a unified API to the common open and community-driven file formats and parsers available for mass spectrometry data, namely mzXML, mzML and mzData (see vignette for details). It uses C and C++ code from other third party open-source projects and heavily relies on the Rcpp package to, notably, provide a direct mapping from R to C++ infrastructure.
Currently, mzR provides two actual backends to read Mass Spectrometry raw data:
netCDF which reads, as the name implies, netCDF dataRAMP to read mzData and mzXML via the ISB RAMP parser. This backend can also read mzML through the proteowizard RAMPadapter around the proteowizard infrastructure, but this interface is limited to the lowest common denominator between the mzXML/mzData/mzML formats.This project is intended to add several related backends to mzR, by providing a direct wrapper around -- and full access to -- the proteowizard msdata object. The candidate will interact closely with Laurent Gatto and Steffen Neumann, and the proteowizard and Rcpp communities.
Project attributes and estimates:
C++ fluency.C and especially C++ essential. The candidate will have to familiarise herself with the mass-spectrometry data, the respective data formats and the proteowizard code base.mzR package.[ Back to top ]
From the Wikipedia entry for Galaxy:
Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.
The new Bioconductor RGalaxy simplifies the process of exposing an R function in Galaxy so that a user can run the function using nothing more than a web browser.
This project would involve taking an existing workflow (or conceiving a new workflow) and exposing it in Galaxy.
Project attributes and estimates:
[ Back to top ]
We would like to see PANTHER annotation contained in a Bioconductor AnnotationDbi package.
PANTHER is found here, and summarized:
The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence. "classifies genes by their function"
Project attributes and estimates:
[ Back to top ]
The combination of DNase I digestion and high-throughput sequencing (DNase-seq) has been used recently to map chromatin accessibility in a given tissue or cell type on a genome-wide scale. In addition to these DNaseI hypersensitivity regions (DHSs), short regions of protected nucleotides known as footprints can be detected, indicating transcription factor binding ocuppancy events.
The aim of this project is to build an algorithm to efficiently detect protein binding footprints in DNase-seq data from reads in BAM/SAM standard aligment format.
Project attributes and estimates:
[ Back to top ]
The easyRNASeq package facilitates and expedites the processing and filtering of large RNA-seq datasets for subsequent analysis by Bioconductor packages edgeR and DESseq, which are concerned with gene-expression and alternative splicing, respectively. We propose to add an output format compatible with DEXSeq, a package for exon-level differential expression analysis.
Project attributes and estimates:
[ Back to top ]
Package authors sometimes have excellent statistical and bioinformatic ideas, but are not fully confident in their ability to produce a robust software package suitable for inclusion in Bioconductor. This mentored project pairs the package developer with an experienced programmer to produce quality software. Participants are expected to have a working version of their package, with the major ideas and preliminary implementation complete.
Project attributes and estimates:
[ Back to top ]
[ Back to top ]
Sometimes the maintainer of an older Bioconductor package is no longer able to perform that job. These older packages remain useful but occasionally need a bug fix or a small change. We are looking for volunteers to maintain such packages -- which would otherwise be abandonded. Relatively little work is required, the original author will be available to answer questions, the Bioconductor core team can help, and the Bioconductor community will benefit.
Current orphans are listed below.
(No orphans at this time)
Please send mail to pshannon AT fhcrc DOT org if you would like to help out on any of these projects, or have an idea of your own which you wish to propose.
[ Back to top ]
The graph package was developed when users created objects with
calls like new("graphNEL"), but there are advantages to hiding this
level of implementation from the user and instead creating a new
instance with graphNEL(). The project modernizes this aspects of the
graph package.
graphNEL() and graphAM(); constructors for
additional classes may also be provided, e.g., attrData(),
clusterGraph(), distGraph(), edgeSet(), edgeSetAM(),
edgeSetNEL(), renderInfo(), simpleEdge().[ Back to top ]
The VariantAnnotation package needed a function to compute genotype counts, allele frequencies and Hardy-Weinberg estimates from the genotype data in a VCF class.
Project attributes and estimates:
[ Back to top ]
Genotypes
MatrixToSnpMatrix() in the VariantAnnotation package converts the genotype data in a VCF object into a SnpMatrix object. Currently this is done without taking uncertain uncertain genotype calls into consideration. This project involves modifying MatrixToSnpMatrix() to use, when available, genotype uncertainty and likelihood information to convert genotypes to probability-based SnpMatrix encodings.
Project attributes and estimates:
[ Back to top ]
The aim of this project is to build a simple GUI to navigate raw mass spectrometry data files. Data input functionality and relevant data structures are available in the mzR and MSnbase packages. The final deliverable would be a new R package, that will be submitted to Bioconductor, implementing the GUI allowing users to directly browse raw data files as well as MSnExp raw data instances. The overall goal being to complement programmatic data access with interactive visualisation.
Project attributes and estimates:
[ Back to top ]
Source Code & Build Reports »
Source code is stored in
svn
(user: readonly, pass: readonly).
Software packages are built and checked nightly. Build reports:
Development Version»
Bioconductor packages under development:
Developer Resources: