1 Introduction

Psychiatric disorders have a great impact on morbidity and mortality [1][2]. According to the World Health Organization (WHO), one of every four people will suffer mental or neurological disorders at some point in their lives [3]. It has been suggested that most psychiatric disorders display a strong genetic component [4][5][6]. During the last years there has been a growing research in psychiatric disorders’ genetics [7], and therefore the number of publications that focus on psychiatric disorders have increased steadily (Figure 1).

Psychiatric disorders in PubMed. It has been obtained querying **psychiatric disorder [Title/Abstract] from 1955 to 2016**.

Figure 1: Psychiatric disorders in PubMed
It has been obtained querying psychiatric disorder [Title/Abstract] from 1955 to 2016.

However, there is still limited understanding on the cellular and molecular mechanisms leading to psychiatric diseases, which has limited the application of this wealth of data in the clinical practice. This situation also applies to psychiatric comorbidities. Some of the factors that explain the current situation is the heterogeneity of the information about psychiatric disorders and its fragmentation into knowledge silos, and the lack of resources that collect these wealth of data, integrate them, and supply the information in an intuitive, open access manner to the community. PsyGeNET [8] has been developed to fill this gap and psygenet2r has been developed to facilitate statistical analysis of PsyGeNET data, allowing its integration with other packages available in R to develop data analysis workflows.

psygenet2r package allows to retrieve the genes associated to psychiatric diseases, or explore the association between a disease of interest and PsyGeNET diseases based on shared genes. In addition, psygenet2r allows the annotation of genes with psychiatric diseases based on expert-curated information. This functionality can be of interest to interpret the results of GWAS or Whole Exome Sequencing studies, in which a list of gene variants is obtained and there is a need to prioritize them based on their functional and clinical relevance. In this context, it would be of interest to know if there is information on their implication in psychiatric diseases. In this Case study we will describe how we can analyze the genes identified in a GWAS study in the context of psychiatric diseases using psygenet2r. For this purpose, we will use as an example the data obtained from a GWAS study on bipolar disorder published by [9]. In this study, the authors analyzed the brain expression of 58 genes, previously identified in a GWAS of bipolar disorder [10], and correlated this information with structural MRI studies to identify brain regions that are abnormal in bipolar disorder. We will use this list of 58 genes from the bipolar disorder study to show the functionality of psygenet2r package.

2 Objective

The goal of the study is to analyze a set of genes discovered by GWAS in the context of PsyGeNET. More specifically, we want to answer the following questions:

  1. Are the genes associated to psychiatric disorders according to PysGeNET?
  2. What is the level of evidence of these associations?
  3. What is the function of the proteins encoded by these genes related to bipolar disorder?
  4. Is bipolar disorder similar to other psychiatric disorders based on shared genes?

3 Implementation

3.1 psygenet2r package

PsyGeNET, a knowledge resource for the exploratory analysis of psychiatric diseases and their genes, contains information on eight psychiatric disorders: depression, bipolar disorder, schizophrenia, alcohol, cocaine and cannabis use disorders, substance-induced depressive disorder and psychoses. PsyGeNET database has been developed by automatic extraction of information from the literature using the text mining tool BeFree [11] (http://ibi.imim.es/befree/), followed by curation by experts in the domain. The current version of PsyGeNET (version 2.0) contains 3,771 associations between 1,549 genes and 117 psychiatric disease concepts. psygenet2r package contains functions to query and analyze PsyGeNET data, and to integrate with other information, as exemplified in this case study.

3.2 Installation

psygenet2r package is provided through Bioconductor [12]. To install psygenet2r the user must type in the two following commands in R session:

source( "http://bioconductor.org/biocLite.R" )
biocLite( "psyGeNET2R" )
library( psygenet2r )

4 Questions that can be answered using psygenet2r

The first step that has to be done before doing any analysis is saving the genes in an R vector. For this case-study the 58 genes obtained from McCarthy et al. [9] are saved into a vector called genesOfInterest.

Genes can be identified using the NCBI gene identifier or the Official Gene Symbol from HUGO.

genesOfInterest <- c("ADCY2", "AKAP13", "ANK3", "ANKS1A", 
"ATP6V1G3", "ATXN1", "C11orf80", "C15orf53", "CACNA1C", 
"CACNA1D", "CACNB3", "CROT", "DLG2", "DNAJB4", "DUSP22", 
"FAM155A", "FLJ16124", "FSTL5", "GATA5", "GNA14", "GPR81", 
"HHAT", "IFI44", "ITIH3", "KDM5B", "KIF1A", "LOC150197", 
"MAD1L1", "MAPK10", "MCM9", "MSI2", "NFIX", "NGF", "NPAS3", 
"ODZ4", "PAPOLG", "PAX1", "PBRM1", "PTPRE", "PTPRT", 
"RASIP1", "RIMBP2", "RXRG", "SGCG", "SH3PXD2A", "SIPA1L2",
"SNX8", "SPERT", "STK39", "SYNE1", "THSD7A", "TNR", 
"TRANK1", "TRIM9", "UBE2E3", "UBR1", "ZMIZ1", "ZNF274")

4.1 How many of these genes are in PsyGeNET?

In order to know how many of the genes of interest are present in PsyGeNET, psygenetGeneList function is used. This function requires as input the genes’ vector and the selected database. For this analysis "ALL" database are selected.

m1 <- psygenetGene(
    gene     = genesOfInterest, 
    database = "ALL",
    verbose  = FALSE,
    warnings = FALSE
## Object of class 'DataGeNET.Psy'
##  . Type:         gene 
##  . Database:     ALL 
##  . Term:         ADCY2 ... SYNE1 
##  . Number of Results:   48 
##  . Number of unique Diseases:  15 
##  . Number of unique Genes:     16

The output is a DataGeNET.Psy object. It contains all the information about the different diseases associated with the genes of interest retrieved from PsyGeNET. By looking at the DataGeNET.Psy object, it can be observed that, according to PsyGeNET and by querying in "ALL"" databases, 16 of the initial genes are found in PsyGeNET. These genes appear associated with 15 different disorders, involving a total of 48 gene-disease associations (GDAs).

4.2 Which diseases are associated to these genes according to PsyGeNET database?

In order to visualize the 48 GDAs between the 16 genes found in PsyGeNET and the 15 different disorders, psygenet2r provides several options. One of them is the GDA network, which can be obtained by applying the plot function to the DataGeNET.Psy object (m1), obtained from psygenetGene function (section 4.1). In the GDA network, blue nodes represent diseases and yellow nodes represent genes.

plot( m1 )