# RGalaxy

### Dan Tenenbaum

Galaxy is an open, web-based platform for data-intensive biomedical research. It provides an easy-to-use web interface and can expose bioinformatics workflows written in any programming language.

Normally, in order to expose new functionality (a tool in Galaxy parlance) in a Galaxy instance, you have to manually create an XML file with information about the function, and modify an additional XML file.

The RGalaxy package automates this process, pulling most of the necessary information from the function itself and its manual page (you provide the remaining information as arguments to the galaxy function).

## A Simple Example

Let's say you want to create a Galaxy tool that adds two numbers.

First, load RGalaxy:

library(RGalaxy)


Then write a function like this:


function(
number1=GalaxyNumericParam(required=TRUE),
number2=GalaxyNumericParam(required=TRUE),
sum=GalaxyOutput("sum", "txt"))
{
cat(number1 + number2, file=sum)
}



• The data type of each parameter is specified. And instead of just specifying R's numeric type, we are using a special class called GalaxyNumericParam. This is because Galaxy (unlike R) needs to know the type of each parameter, as well as other information
• The function's name is descriptive.
• The return value of the function is not important.
• All the function's inputs and outputs are specified as arguments in its signature. This is required as Galaxy communicates with tools by sending them files and reading files they generate.
• By default, parameters are marked as not required by Galaxy. Adding required=TRUE tells Galaxy not to allow empty values.
• This function can be run from within R, passing it ordinary numeric values:
t <- tempfile()

## [1] "4"


## Documenting the Example

We're almost ready to tell Galaxy about our function, but first we need to document it with a manual page. RGalaxy will use information in this page to create the Galaxy tool, and the man page will also be useful to anyone who wants to run your function in R.

The man page might look like this:

\name{addTwoNumbers}

\description{
An example function that can be made into a Galaxy tool.
Takes two numbers, adds them, and returns a file containing
the result.
}

\usage{
number2=GalaxyNumericParam(required=TRUE),
sum=GalaxyOutput("sum", "txt"))
}

\arguments{
\item{number1}{
}
\item{number2}{
}
\item{sum}{
Where the result of the addition should be written.
}
}

\value{
invisible(NULL)
}

\seealso{

}

\examples{
t <- tempfile()
}


• The name, alias, description, title, usage, and arguments sections are required. The details section is not required but its use is encouraged (RGalaxy will notify you if this section is missing).
• While it's good to have an examples section, this section is only useful to people running your function via R. This section is not used by RGalaxy.

## Installing Galaxy

Before we can tell Galaxy about our function, we have to install Galaxy.

The Galaxy Installation page gives full instructions, but in a nutshell, you can install Galaxy as follows (you may need to install Mercurial, which provides the hg command):

hg clone https://bitbucket.org/galaxy/galaxy-dist/


The directory where you just installed Galaxy (a full path ending in galaxy-dist) is your “Galaxy Home” directory, represented by galaxyHome in the following code snippet.

## Telling Galaxy about the function

Now we point Galaxy to the function we just wrote:

galaxy("addTwoNumbers",
galaxyConfig=
GalaxyConfig(galaxyHome, "mytool", "Test Section",
"testSectionId")
)

## Warning: Not enough information to create a functional test.

## Note: Did not find section 'Details' in man page.

## [1] "/tmp/RtmpqhylOl/Rbuild77e03f7cdb38/RGalaxy/vignettes/tools/mytool/addTwoNumbers.xml"


Notice the warning about functional tests. We'll cover that later in the vignette.

The galaxy function notifies you that the details section of the man page is empty. It also returns the path to the XML tool wrapper it created.

## Running the example function in Galaxy

To start Galaxy, open a command window and change to your Galaxy home directory (defined earlier). Then issue this command:

If Galaxy is already running, you should stop it (with control-C) and restart it with the command above. Galaxy should always be restarted after running the galaxy function.

You can now access Galaxy at http://localhost:3000.

If you click on “Test Section” and then “Add Two Numbers”, you should see something like Figure 1.

• RGalaxy has generated a tool in which each parameter has some explanatory text that comes from our man page.
• The tool name (“Add Two Numbers”) comes from the function name (this can be overridden by passing name to galaxy()).
• If you try and enter a non-number, Galaxy will complain. This is because we specified GalaxyNumericParam in our function.
• If you try and leave either of the numbers blank, Galaxy will complain. This is because we specified required=TRUE.

If we enter 10 and 5, then click “Execute”, Galaxy will run and when finished will show 'sum.txt' in the History Pane at the right. Clicking on it should show something like Figure 2. You can download the result or send it to another Galaxy tool.

## Functional Testing

We just ran Galaxy and made sure our tool worked. It would be nice to automate this procedure so we can know that for inputs x and y, our tool will always produce output d.

With a couple of small additions, we can accomplish this. Our function will have a self-contained test.

Also, when submitting tools to the public Galaxy instance, functional tests like this are required.

Here is our addTwoNumbers function again, this time with a functional test:


function(
number1=GalaxyNumericParam(required=TRUE, testValues=5L),
number2=GalaxyNumericParam(required=TRUE, testValues=5L),
sum=GalaxyOutput("sum", "txt"))
{
cat(number1 + number2, file=sum)
}



The only visible difference is that we've added a testValues argument to each input parameter. Another, subtler difference is that we have added a file in our package called inst/functionalTests/addTwoNumbersWithTest/sum, which contains the expected output of the function. By using this convention, we ensure RGalaxy can find the file.

Does the function pass its functional test?

runFunctionalTest(addTwoNumbersWithTest)

## [1] TRUE


Note that this just runs the function in R, it does not test it inside a running Galaxy. But because the functional test infrastructure is present in the XML file generated by RGalaxy, you can do that from your Galaxy home directory as follows:

./run_functional_tests.sh -id addTwoNumbersWithTest


The output of the test will be written to run_functional_tests.html.

Note that R doesn't always produce the same output each time, even though the files may look identical. The pdf function in particular may produce different files. You can use the png function as a workaround.

### Should my function be in a package?

We've glossed over it so far, but the addTwoNumbers() function and its man page live a package (the RGalaxy package in this case). It is possible to expose in Galaxy a function that does not live in a package, but you have to provide a lot of extra information. We recommend that the functions you expose live in a package (and be exported in your NAMESPACE file).

## Best practices

• If your function depends on other packages, load those packages with library() within your function.
• Your code should handle improper input and other error conditions with the function gstop(). Error messages will be seen by the Galaxy user. Also use gwarning() and gmessage() for warnings and informational messages.

## Using Rserve for better performance

Galaxy runs tools by invoking scripts in various languages at the command line. These scripts are generally self-contained. Sometimes it can take a long time for the script to load its dependencies. Sometimes this takes longer than the actual work that the script is supposed to do. We can stop waiting for the script to load its dependencies if the script does its work on a remote instance of R where the dependencies have already been loaded. We accomplish this using the Rserve package.

To use Rserve, create an Rserv.conf file that contains statements like this:

eval library(LongLoadingPackage1)


Replace the package names with the packages your function uses that take a long time to load.

Start Rserve as follows:

R CMD Rserve --vanilla --RS-conf Rserv.conf


Re-run Galaxy on your function, specifying that Rserv should be used:

galaxy("addTwoNumbersWithTest",
galaxyConfig=
GalaxyConfig(galaxyHome, "mytool", "Test Section",
"testSectionId"),
RserveConnection=RserveConnection()
)


Install the RSclient package:

source("http://bioconductor.org/biocLite.R")
biocLite("RSclient", siteRepos="http://www.rforge.net")


Restart Galaxy if it is already running. Your function should be much faster.

You can run Rserve on a different machine (and on a different port) by passing this information to the RserveConnection() function:

RserveConnection(host="mymachine", port=2012L)

## An object of class "RserveConnection"
## Slot "host":
## [1] "mymachine"
##
## Slot "port":
## [1] 2012


Note that the other machine should have shared disk space with the machine where you are running Galaxy.

## A practical example

Suppose you have some Affymetrix probe IDs and you want to look up the PFAM and SYMBOL names for them. It's quite easy to write a function to expose this in Galaxy:


probeLookup <-
function(
probe_ids=GalaxyCharacterParam(
required=TRUE,
testValues="1002_f_at 1003_s_at"),
outputfile=GalaxyOutput("probeLookup", "csv"))
{
suppressPackageStartupMessages(library(hgu95av2.db))
ids <- strsplit(probe_ids, " ")[[1]]
results <- select(hgu95av2.db, keys=ids, columns=c("SYMBOL","PFAM"),
keytype="PROBEID")
write.csv(results, file=outputfile)
}



Behind the scenes, we've also written a man page for the function, and put a test fixture in our package (which can be found at inst/functionalTests/probeLookup/outputfile).

Let's run it and make sure it works:

runFunctionalTest(probeLookup)

## [1] TRUE