Google Summer of Code Ideas List for 2014

This page is the main information point for Bioconductors participation in the Google summer of Code project this year (2014).

Overview

Each selected student (mentee) will be paid USD $5500 to work on a Bioconductor project for 3 months during the summer.

Students should look at the list of projects and see if any project interests them. Email the project mentors to express your interest, and describe any prior experience.

Students with ideas for Bioconductor projects not listed below are encouraged to email any of the mentors listed below with project ideas.

Students will submit project applications directly to Google.

Google will award a certain number of student slots to the Bioconductor project.

The Bioconductor administrators and mentors will rank projects in order of importance to the project, and the top projects will be funded.

Any selected students will be expected to register with the bioconductor and bioc-devel mailing lists.

There is a timeline posted at Google explaining how this works. Students are encouraged to look at this and make sure that they can commit to this. There is also a FAQ in case people have other questions that are not addressed here.

Here are our suggested ideas:

ExperimentHub project

Background/Motivation: As very large genomic data sets become more and more common, computational biologists are spending inordinate time transforming data from the format of the original resource to a format amenable to computation in their programming language of choice. The R / Bioconductor community needs programmatic access to cloud-based experimental data resources that can be readily incorporated into their own work flows.

Goal

AnnotationHub and its supporting packages are primed to support such a project. AnnotationHub provides infrastructure to make well-curated resources available to R software clients, but it needs the addition of a GUI interface to allow addition of user-supplied resources, including transformation of data into formats amenable to direct use by R clients.

The task

Work with us to create a GUI interface that does the following:

1) allows the user to add large genomic resources that have been transformed into a GenomicRanges::GRanges object along with their associated metadata to a NOSQL back-end database. 2) provides an intuitive front end using a shiny method that allows the user to upload the object that was passed in to the method up to the DB 3) checks and validates that all the metadata has been filled in appropriately when the shiny GUI is being run and then uploads that to the DB. 4) on the back end, enable a new instance of the AnnotationHubServer that knows how to listen for requests from the GUI and can add the data when appropriate. 5) Once the method and back end are both working for GenomicRanges::GRanges objects, you should also write methods for other popular Bioconductor objects such as: Biobase::ExpressionSet, GenomicRanges::SummarizedExperiment,
GenomicRanges::GrangesList.

Skills required

Familiarity with R S4 methods and with shiny.

Test (to help you get started and also to indicate your competence)

Mentor

Marc Carlson mcarlson@fhcrc.org

Backup Mentors

Dan Tenenbaum dtenenba@fhcrc.org Martin Morgan mtmorgan@fhcrc.org

Enabling Interactivity in Reproducible Reports

Background / Motivation

Every scientific analysis should result in a reproducible report. The ReportingTools Bioconductor package provides multiple means of report generation, including an imperative API driven by an R script, as well as a declarative interface through knitr. ReportingTools converts common R/Bioconductor data structures like data.frames and ExpressionSets into report elements, such as tables and plots, according to user-definable mappings. It supports multiple backends, with the HTML backend being the most developed. The HTML report elements have some limited interactivity (such as sortable tables). Additional interactivity is enabled through integration with the shiny package.

Our goal is to add some simple interactive HTML report elements, where the interactivity is implemented in the front-end, independent of shiny or other R instance. These would include basic plots of summaries, as well as simple genomic plots, which would display alignments, per-position summaries, and annotations along the genome. Summarized data could be directly embedded in the report, and the client would access large datasets by querying standard biological data sources like DAS, AnnotationHub, ExperimentHub and web-accessible BAM/VCF/BigWig files. This work would occur in a new or at least separate package that could be used directly, or through its integration with ReportingTools. We may be able to leverage existing work like clickme and Rcharts for the low-level plotting. The genomic plotting might rely on and even drive the development of the pViz javascript library.

The goal is for simple, lightweight plots in redistributable reports. There is no intent for this to replace the more sophisticated shiny-based solutions, nor applications like epiviz(R).

Tasks

Skills required

Mentors

Michael Lawrence lawrence.michael@gene.com

Backup Mentors

Martin Morgan mtmorgan@fhcrc.org Marc Carlson mcarlson@fhcrc.org

Extending mzR

Introduction

The mzR R/Bioconductor package provides a unified application programming interface to the common open and community-driven file formats and parsers available for mass spectrometry data, namely mzXML, mzML and mzData (see current vignette for details and references). It relies on C and C++ code from other third party open-source projects and the Rcpp package to, notably, provide a direct mapping from R to C++ infrastructure.

Currently, mzR provides two back-ends to read mass spectrometry raw data:

More details about the project can be found on the official package page.

Goal

The goal is to extend current useful, yet limited capabilities of mzR by adding support for the state-of-the-art proteowizard project.

The tasks

We will provide example data in all formats and support on the domain. This project will use the mzR github page as main collaboration and communication hub.

Skills needed:

The candidate will have to familiarise herself proteowizard code base.

Mentors

Laurent Gatto lg390@cam.ac.uk

Backup mentors:

Steffen Neuman sneumann@ipb-halle.de Dirk Eddelbuettel edd@debian.org

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version»

Bioconductor packages under development:


Developer Resources:

Fred Hutchinson Cancer Research Center