1 Introduction

The psygenet2r package contains functions to query PsyGeNET [1], a resource on psychiatric diseases and their genes. This can also be done using the web tool. However, the psygenet2r package goes a step further including analysis and visualization functions to study psychiatric diseases, their genes and disease comorbidities, as well as analyzing the tissues/anatomical structures in which the genes are expressed. A special focus is made on visualization of the results (not web available), providing a variety of representation formats such as networks, heatmaps and barplots (Table 3).

1.1 Background

During the last years there has been a growing interest in the genetics of psychiatric disorders, leading to a concomitant increase in the number of publications that report these studies [2]. However, there is still limited understanding on the celular and molecular mechanisms leading to psychiatric diseases, which has limited the application of this wealth of data in the clinical practice. This situation also applies to psychiatric comorbidities. Some of the factors that explain the current situation is the heterogeneity of the information about psychiatric disorders and its fragmentation into knowledge silos, and the lack of resources that collect these wealth of data, integrate them, and supply the information in an intuitive, open access manner to the community. PsyGeNET has been developed to fill this gap. psygenet2r has been developed to facilitate statistical analysis of PsyGeNET data, allowing its integration with other packages available in R to develop data analysis workflows.

PsyGeNET is a resource for the exploratory analysis of psychiatric diseases and their associated genes. The second release of PsyGeNET (version 2.0) contains updated information on depression, bipolar disorder, alcohol use disorders and cocaine use disorders, and has been expanded to cover other psychiatric diseases of interest: bipolar disorder, schizophrenia, substance-induced depressive disorder and psychoses and cannabis use disorder (Table 1). PsyGeNET allows the exploration of the molecular basis of psychiatric disorders by providing a comprehensive set of genes associated to each disease. Moreover, it allows the analysis of the molecular mechanisms underlying psychiatric disease comorbidities.

Table 1: Psychiatric diseases included in PsyGeNET
Long Name Short Name Acronym
Alcohol use disorders Alcohol UD AUD
Bipolar disorders and related disorders Bipolar disorder BD
Depressive disorders Depression DEP
Schizophrenia spectrum and other psychotic disorders Schizophrenia SCHZ
Cocaine use disorders Cocaine UD CUD
Substance induced depressive disorder SI-Depression SI-DEP
Cannabis use disorders Cannabis UD CanUD
Substance induced psychosis SI-Psychosis SI-PSY

PsyGeNET database is the result of the data extracted from the literature by text mining using BeFree [3], followed by manual curation by domain experts. A team of 22 experts participates as curators of the database. The current version of PsyGeNET (version 2.0) contains 3,771 associations between 1,549 genes and 117 psychiatric disease concepts.

With psygenet2r package the user is able to submit queries to PsyGeNET from R, perform a variety of analysis on the data, and visualize the results through different types of graphical representations.

The tasks that can be performed with psygenet2r package are the following:

  1. Retrieve Gene-Disease Associations (GDAs) from PsyGeNET using as query a gene or a disease (single or a set of genes/diseases) of interest
  2. Visualize the results according to the GDAs’ attributes: PsyGeNET Evidence Index, number of publications, sentences that report the GDA, source dadatabase
  3. Visualize the results according to the disease (disease class) or gene (Panther class) attributes
  4. Analyze the association between two diseases based on shared genes (using the Jaccard index)
  5. Characterizing the disease genes by molecular function using Panther classes or expression site using TopAnat / Bgee database.

In the following sections the specific functions that can be used to address each of these tasks are presented.

1.2 Installation

The package psygenet2r is provided through Bioconductor. To install psygenet2r the user must type the two following commands in an R session:

source( "http://bioconductor.org/biocLite.R" )
biocLite( "psygenet2r" )
library( psygenet2r )

1.3 DataGeNET.Psy

DataGeNET.Psy object is obtained when psygenetGene and psygenetDisease functions are applied. This object is used as input for the rest of psyGeNET2r functions, like the plot function.

DataGeNET.Psy object contains all the information about the different diseases/genes associated with the gene/disease of interested retrieved from PsyGeNET. This object contains a summary of the search, such as the search input (gene or disease), the selected database, the gene or disease identifier, the number of associations found (N. Results) and the number of unique results obtained (U. Results).

t1
## Object of class 'DataGeNET.Psy'
##  . Type:         gene 
##  . Database:     ALL 
##  . Term:         4852 
##  . Number of Results:   13 
##  . Number of unique Diseases:  13 
##  . Number of unique Genes:     1
class( t1 )
## [1] "DataGeNET.Psy"
## attr(,"package")
## [1] "psygenet2r"

This object comes with a series of functions to allow users to interact with the information retrieved from PsyGeNET. These functions are ngene, ndisease, extract and plot. The first function ngene returns the number of retrieved genes for a given query. ndisease is the homologous function but for the diseases. The function extract returns a formatted data.frame with the complete set of information downloaded from PsyGeNET. Finally, the plot function allows the visualization of the results in a variety of ways such as gene-disease association networks or heatmaps.

3 PsyGeNET and psygenet2r

The PsyGeNET web interface can be explored by searching a specific gene or a specific disease, and psygenet2r package has the same options. Therefore, the starting point for psygenet2r are psygenetGene and psygenetDisease functions.

PsyGeNET data is classified according to the database used as a source of information (“source database”). Therefore, any query run on PsyGeNET requires to specify the source database using the argument called database. Table (tab:psygenet-databases) shows the source databases in PsyGeNET and their description. By default, the database "ALL" is used in psygenet2r. For illustrating purposes along the vignette, database `ALL} will be used in most of code snippets.


Table 2: Source databases included in PsyGeNET
Name Description
psycur15 Genes associated to DEP, BD, AUD and CUD between 1980 and 2013 (PsyGeNET release v1.0)
psycur16 Genes associated to DEP, BD, AUD, CUD, SCHZ, S-DEP, CanUD and D-PSY between 1980 and 2015
ALL All previous Databases

2 Retrieve gene-disease associations (GDAs) using psygenet2r

2.1 Using genes as a query

psygenet2r package allows exploring PsyGeNET information using a specifc gene or a list of genes. It retrieves the information that is available in PsyGeNET (associated diseases, source database, PsyGeNET Evidence Index, number of publications, attributes of genes, etc) and allows to visualize the results in different ways.

2.1.1 Using as a query a single gene

In order to look for a single gene into PsyGeNET, we can use the psygenetGene function. This function retrieves PsyGeNET’s information using both, the NCBI gene identifier and the official Gene Symbol from HUGO. It contains also other arguments like the database to query, the PsyGeNET evidence index (evidenceIndex argument).

As an example, the gene NPY, whose entrez id is 4852 is queried using psygenetGene function, and using alternatively the official HUGO Gene Symbol. In this example database "ALL".

t1 <- psygenetGene( gene = 4852, 
                    database = "ALL")
t1
## Object of class 'DataGeNET.Psy'
##  . Type:         gene 
##  . Database:     ALL 
##  . Term:         4852 
##  . Number of Results:   13 
##  . Number of unique Diseases:  13 
##  . Number of unique Genes:     1
t2 <- psygenetGene( gene = "NPY", 
                    database = "ALL" )
t2
## Object of class 'DataGeNET.Psy'
##  . Type:         gene 
##  . Database:     ALL 
##  . Term:         NPY 
##  . Number of Results:   13 
##  . Number of unique Diseases:  13 
##  . Number of unique Genes:     1

Both cases result in an DataGeNET.Psy object:

class( t1 )
## [1] "DataGeNET.Psy"
## attr(,"package")
## [1] "psygenet2r"
class( t2 )
## [1] "DataGeNET.Psy"
## attr(,"package")
## [1] "psygenet2r"

In the particular example used, by inspecting the DataGeNET.Psy object, we can see that the gene NPY is associated to 13 different diseases in PsyGeNET (with no restriction on the PsyGeNET evidence index).

2.1.2 Ploting the results of a Single Gene Query

psygenet2r offers several options to visualize the results from PysGeNET in networks by changing the type argument when applying the plot function. A network showing the diseases (type = "GDA network") or the psychiatric disorders (type = "GDCA network") related to the gene of interest is obtained.

By default, psygenet2r shows a network when ploting a DataGeNET.Psy object obtained by a gene-query. The result is a network where blue nodes are diseases and the yellow node is the gene of interest. Node colors can be changed adding the following arguments geneColor and diseaseColor.

plot( t1, type = "GDA network" )