Contents

Abstract

In order to make light of cancer development, it is crucial to understand which genes play a role in the mechanisms linked to this disease and moreover which role that is. Commonly biological processes such as proliferation and apoptosis have been linked to cancer progression. We have developed the Moonlight framework that allows for prediction of cancer driver genes through multi-omics data integration. Based on expression data we perform functional enrichment analysis, infer gene regulatory networks and upstream regulator analysis to score the importance of well-known biological processes with respect to the studied cancer. We then use these scores to predict oncogenic mediators with two specific roles: genes that potentially act as tumor suppressor genes (TSGs) and genes that potentially act as oncogenes (OCGs). This constitutes Moonlight’s primary layer. As gene expression data alone does not explain the cancer phenotypes, a second layer of evidence is needed. We have integrated two secondary layers, one based on mutations, and one based on abnormal DNA methylation patterns, for prediction of the driver genes. The mutational layer predicts driver mutations in the oncogenic mediators and thereby allows for the prediction of cancer driver genes using the driver mutation prediction tool CScape-somatic. The methylation layer investigates abnormal DNA methylation patterns and differentially methylated CpG sites in the oncogenic mediators using the tool EpiMix and uses patterns of hypo- and hypermethylation for prediction of the driver genes. Overall, this methodology not only allows us to identify genes with dual role (TSG in one cancer type and OCG in another) but also to elucidate the underlying biological processes.

Introduction

Cancer development is influenced by (epi)genetic alterations in two distinctly different categories of genes, known as tumor suppressor genes (TSG) and oncogenes (OCG). The occurrence of these alterations in genes of the first category leads to faster cell proliferation while alterations in genes of second category increases or changes their function. In 2020, we developed the Moonlight framework that allows for prediction of cancer driver genes (Colaprico, Antonio and Olsen, Catharina and Bailey, Matthew H. and Odom, Gabriel J. and Terkelsen, Thilde and Silva, Tiago C. and Olsen, André V. and Cantini, Laura and Zinovyev, Andrei and Barillot, Emmanuel and Noushmehr, Houtan and Bertoli, Gloria and Castiglioni, Isabella and Cava, Claudia and Bontempi, Gianluca and Chen, Xi Steven and Papaleo, Elena 2020). Here, gene expression data are integrated together with biological processes and gene regulatory networks to score the importance of well-known biological processes with respect to the studied cancer. These scores are used to predict oncogenic mediators: putative TSGs and putative OCGs. As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. For this reason, we integrated a secondary mutational layer which predicts driver mutations and passenger mutations in the oncogenic mediators. The prediction of the driver mutations are carried out using the CScape-somatic (Rogers, Mark F and Gaunt, Tom R and Campbell, Colin 2020) driver mutation prediction tool. Those oncogenic mediators with at least one driver mutation are retained as the final set of driver genes (Nourbakhsh, Mona and Saksager, Astrid and Tom, Nikola and Chen, Xi Steven and Colaprico, Antonio and Olsen, Catharina and Tiberti, Matteo and Papaleo, Elena 2023). Moreover, we have integrated a secondary methylation layer that investigates abnormal DNA methylation patterns in the oncogenic mediators using the tool EpiMix (Zheng, Yuanning and Jun, John and Brennan, Kevin and Gevaert, Olivier 2023). Those oncogenic mediators demonstrating hypo- and hypermethylation patterns are predicted as OCGs and TSGs, respectively. These new functionalities are released in the updated version of Moonlight to produce Moonlight2R.

Moonlight’s pipeline

Moonlight’s pipeline is shown below:

Moonlight’s proposed workflow

The proposed pipeline consists of following eight steps:

  1. The input to Moonlight is a set of differentially expressed genes between two biological conditions such as cancer and healthy samples or two cancer subtyoes. Besides differentially expressed genes, gene expression data is also needed. Mutation data and/or methylation data are also needed.
  2. FEA Functional Enrichment Analysis, using Fisher’s test, to identify gene sets (with biological functions linked to cancer) significantly enriched by regulated genes (RG).
  3. GRN Gene Regulatory Network inferred between each single DEG (sDEG) and all genes by means of mutual information, obtaining for each DEG a list of RG.
  4. URA Upstream Regulator Analysis for DEGs in each enriched gene set, we applied z-score being the ratio between the sum of all predicted effects for all the gene involved in the specific function and the square-root of the number of all genes.
  5. PRA Pattern Recognition Analysis identifies putative TSGs (down) and OCGs (up). We either use user defined biological processes or random forests.
  6. DMA Driver Mutation Analysis identifies driver mutations in the putative TSGs and OCGs through the driver mutation prediction tool, CScape-somatic.
  7. GMA Gene Methylation Analysis identifies abnormal DNA methylation patterns in the putative TSGs and OCGs through the tool, EpiMix.
  8. We apply the above procedure to multiple cancer types to obtain cancer-specific lists of TSGs and OCGs. We compare the lists for each cancer: if a sDEG is TSG in a cancer and OCG in another we define it as dual-role TSG-OCG. Otherwise if we find a sDEG defined as OCG or TSG only in one tissue we define it as tissue specific biomarker.
  9. We use the COSMIC database to define a list of gold standard TSG and OCGs to assess the accuracy of the proposed method.

Installation

To install Moonlight2R use the code below.

Installation from BioConductor

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Moonlight2R")

Installation from GitHub

First, install devtools or if you already have it installed, load it.

install.packages("devtools")
library(devtools)

Install Moonlight2R from GitHub:

devtools::install_github(repo = "ELELAB/Moonlight2R")

Installation from GitHub with accompanying vignette

First, install the BiocStyle Bioconductor package.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BiocStyle")

Then install Moonlight2R with its accompanying vignette.

devtools::install_github(repo = "ELELAB/Moonlight2R", build_vignettes = TRUE)

You can view the vignette in the following way.

vignette( "Moonlight2R", package="Moonlight2R")

Load libraries

library(Moonlight2R)
## Loading required package: doParallel
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## 
## 
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Obtain Input

The input to Moonlight is a set of differentially expressed genes and gene expression data. Mutation and/or methylation data are also needed. These data should be available for the same sample sets. Gene expression data, mutation data, methylation data and differentially expressed genes can for example be obtained from TCGA using the R package TCGAbiolinks. Help documents on how to use TCGAbiolinks are available here. To find other examples of usage of TCGAbiolinks on TCGA cancer types see our GitHub repository. Example data of the input (differentially expressed genes, gene expression data, mutation data, and methylation data) are stored in the Moonlight2R package:

data(DEGsmatrix)
data(dataFilt)
data(dataMAF)
data(GEO_TCGAtab)
data(LOC_transcription)
data(LOC_translation)
data(LOC_protein)
data(Oncogenic_mediators_mutation_summary)
data(DEG_Mutations_Annotations)
data(dataMethyl)
data(DEG_Methylation_Annotations)
data(Oncogenic_mediators_methylation_summary)
data(MetEvidenceDriver)
data(LUAD_sample_anno)

Download: Get GEO data

You can search GEO data using the getDataGEO function.

GEO_TCGAtab: a 18x12 matrix that provides the GEO data set we matched to one of the 18 given TCGA cancer types

knitr::kable(GEO_TCGAtab, digits = 2,
       caption = "Table with GEO data set matched to one
       of the 18 given TCGA cancer types ",
       row.names = TRUE)
Table 1: Table with GEO data set matched to one
of the 18 given TCGA cancer types
Cancer TP NT DEG. Dataset TP.1 NT.1 Platform DEG.. Common GEO_Normal GEO_Tumor
1 BLCA 408 19 2937 GSE13507 165 10 GPL65000 2099 896 control cancer
2 BRCA 1097 114 3390 GSE39004 61 47 GPL6244 2449 1248 normal Tumor
3 CHOL 36 9 5015 GSE26566 104 59 GPL6104 3983 2587 Surrounding Tumor
4 COAD 286 41 3788 GSE41657 25 12 GPL6480 3523 1367 N A
5 ESCA 184 11 2525 GSE20347 17 17 GPL571 1316 406 normal carcinoma
6 GBM 156 5 4828 GSE50161 34 13 GPL570 4504 2660 normal GBM
7 HNSC 520 44 2973 GSE6631 22 22 GPL8300 142 129 normal cancer
8 KICH 66 25 4355 GSE15641 6 23 GPL96 1789 680 normal chromophobe
9 KIRC 533 72 3618 GSE15641 32 23 GPL96 2911 939 normal clear cell RCC
10 KIRP 290 32 3748 GSE15641 11 23 GPL96 2020 756 normal papillary RCC
11 LIHC 371 50 3043 GSE45267 46 41 GPL570 1583 860 normal liver HCC sample
12 LUAD 515 59 3498 GSE10072 58 49 GPL96 666 555 normal tumor
13 LUSC 503 51 4984 GSE33479 14 27 GPL6480 3729 1706 normal squamous cell carcinoma
14 PRAD 497 52 1860 GSE6919 81 90 GPL8300 246 149 normal prostate tumor samples
15 READ 94 10 3628 GSE20842 65 65 GPL4133 2172 1261 M T
16 STAD 415 35 2622 GSE2685 10 10 GPL80 487 164 N T
17 THCA 505 59 1994 GSE33630 60 45 GPL570 1451 781 N T
18 UCEC 176 24 4183 GSE17025 GPL570 tp lcm

getDataGEO: Search by cancer type and data type [Gene Expression]

The user can query and download the cancer types supported by GEO, using the function getDataGEO:

dataFilt_GEO <- getDataGEO(GEOobject = "GSE20347", platform = "GPL571")
dataFilt_GEO <- getDataGEO(TCGAtumor = "ESCA")

FEA: Functional Enrichment Analysis

The user can perform a functional enrichment analysis using the function FEA. For each DEG in the gene set a z-score is calculated. This score indicates how the genes act in the gene set.

data(DEGsmatrix)
data(DiseaseList)
data(EAGenes)
dataFEA <- FEA(DEGsmatrix = DEGsmatrix)

The output can be visualized with a FEA plot.

FEAplot: Functional Enrichment Analysis Plot

The user can plot the result of a functional enrichment analysis using the function plotFEA. A negative z-score indicates that the process’ activity is decreased. A positive z-score indicates that the process’ activity is increased.

plotFEA(dataFEA = dataFEA, additionalFilename = "_exampleVignette", height = 10, width = 20)

The figure generated by the above code is shown below:

GRN: Gene Regulatory Network

The user can perform a gene regulatory network analysis using the function GRN which infers the network using the parmigene package. For illustrative purposes and to decrease runtime, we have set nGenesPerm = 5 and nBoot = 5 in the example below, however, we recommend setting these parameters to nGenesPerm = 2000 and nBoot = 400 to achieve optimal results, as they are set by default in the function arguments.

data(DEGsmatrix)
data(dataFilt)
dataGRN <- GRN(DEGsmatrix = DEGsmatrix, TFs = sample(rownames(DEGsmatrix), 100), normCounts = dataFilt,
                   nGenesPerm = 5, kNearest = 3, nBoot = 5, DiffGenes = TRUE)

URA: Upstream Regulator Analysis

The user can perform upstream regulator analysis using the function URA. This function is applied to each DEG in the enriched gene set and its neighbors in the GRN.

data(dataGRN)
data(DEGsmatrix)
data(DiseaseList)
data(EAGenes)

dataFEA <- FEA(DEGsmatrix = DEGsmatrix)

BPselected <- dataFEA$Diseases.or.Functions.Annotation[1:5]
dataURA <- URA(dataGRN = dataGRN,
               DEGsmatrix = DEGsmatrix,
               BPname = BPselected, 
               nCores=1)
## Warning: executing %dopar% sequentially: no parallel backend registered

PRA: Pattern Regognition Analysis

The user can retrieve putative TSG/OCG using either selected biological processes in the expert-based approach or a random forest classifier trained on known COSMIC OCGs/TSGs in the machine learning approach. The user must enter the chosen biological processes in the BPname argument to use the expert-based approach or set BPname = NULL to use the machine learning approach.

data(dataURA)
data(tabGrowBlock)
data(knownDriverGenes)
dataPRA <- PRA(dataURA = dataURA,
               BPname = c("apoptosis","proliferation of cells"),
               thres.role = 0)

DMA: Driver Mutation Analysis

The user can identify driver mutations with DMA in the oncogenic mediators established by PRA. The passenger or driver status is estimated with CScape-somatic (Rogers, Mark F and Gaunt, Tom R and Campbell, Colin 2020).
This function will further generate three files: DEG_Mutations_Annotations.rda, Oncogenic_mediators_mutation_summary.rda and cscape_somatic_output.rda. These will be placed in the specified results-folder. The user needs to download two CScape-somatic files in order to run DMA named css_coding.vcf.gz and css_noncoding.vcf.gz, respectively. These two files can be downloaded at http://cscape-somatic.biocompute.org.uk/#download. The corresponding .tbi files (css_coding.vcf.gz.tbi and css_noncoding.vcf.gz.tbi) must also be downloaded and be placed in the same folder.

data(dataPRA)
data(dataMAF)
data(DEGsmatrix)
data(LOC_transcription)
data(LOC_translation)
data(LOC_protein)
data(NCG)
data(EncodePromoters)
dataDMA <- DMA(dataMAF = dataMAF,
               dataDEGs = DEGsmatrix, 
               dataPRA = dataPRA,
               results_folder = "DMAresults",
               coding_file = "css_coding.vcf.gz",
               noncoding_file = "css_noncoding.vcf.gz")

GMA: Gene Methylation Analysis

The user can predict driver genes with GMA following prediction of the oncogenic mediators established by PRA. GMA predicts driver genes by investigating abnormal DNA methylation patterns in the oncogenic mediators using the tool EpiMix (Zheng, Yuanning and Jun, John and Brennan, Kevin and Gevaert, Olivier 2023). Oncogenic mediators with hypo- and hypermethylation patterns are predicted as OCGs and TSGs, respectively. This function will generate these files: DEG_Methylation_Annotations.rda, Oncogenic_mediators_methylation_summary.rda, EpiMix_Results_Enhancer.rds, EpiMix_Results_Regular.rds, FunctionalPairs_Enhancer.csv, FunctionalPairs_Regular.csv, and FunctionalProbes_Regular.rds. These will be placed in the specified results-folder.

data("dataMethyl")
data("dataFilt")
data("dataPRA")
data("DEGsmatrix")
data("LUAD_sample_anno")
data("NCG")
data("EncodePromoters")
data("MetEvidenceDriver")

# Subset column names (sample names) in expression data to patient level
pattern <- "^(.{4}-.{2}-.{4}-.{2}).*"
colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt))

dataGMA <- GMA(dataMET = dataMethyl, dataEXP = dataFilt, 
               dataPRA = dataPRA, dataDEGs = DEGsmatrix, 
               sample_info = LUAD_sample_anno, met_platform = "HM450",
               prevalence_filter = NULL,
               output_dir = "./GMAresults", cores = 1, roadmap.epigenome.ids = "E096", 
               roadmap.epigenome.groups = NULL)
## Running Regular mode...
## Fetching probe annotation...
## see ?sesameData and browseVignettes('sesameData') for documentation
## loading from cache
## require("GenomicRanges")
## Found 13 samples with both methylation and gene expression data.
## Modeling gene expression...
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
## Found 73 transcriptionally predictive probes.
## Found 17 samples in group.1 and 6 samples in group.2
## 
## Starting Beta mixture modeling.
## Running Beta mixture model on 73 probes and on 17 samples.
## 
  |                                                                            
  |                                                                      |   0%
## Found 73 differentially methylated CpGs
## Identifying functional CpG-gene pairs...
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Found 84 functional probe-gene pairs.
## Saving the EpiMix results to the output directory...
## Running Enhancer mode...
## Fetching probe annotation...
## see ?sesameData and browseVignettes('sesameData') for documentation
## loading from cache
## Found 17 samples in group.1 and 6 samples in group.2
## Fetching enhancer CpGs from Roadmap Epigenomics...
## Downloading chromatin states from the Roadmap Epigenomics...
##  Identifed 65057 enhancer CpGs from the epigenome E096
## see ?sesameData and browseVignettes('sesameData') for documentation
## loading from cache
## Returning distal probes: 160862
## Found 7 CpGs associated with distal enhancers in the methylation dataset
## 
## Starting Beta mixture modeling.
## Running Beta mixture model on 7 probes and on 17 samples.
## 
  |                                                                            
  |                                                                      |   0%
## Found 7 differentially methylated CpGs
## Modeling the gene expression for enhancers...
## Searching for the 20 near genes
## Identifying gene position for each probe
## Looking for differentially methylated enhancers associated with gene expression
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |======================================================================| 100%
## Found 2 functional probe-gene pairs.
## Saving the EpiMix results to the output directory...
## see ?sesameData and browseVignettes('sesameData') for documentation
## loading from cache
## [1] "3 oncogenic mediator(s) out of 3 were found in the None evidence category"

Level of consequence: Effect of mutations on three different levels

The user can investigate the predicted effect of different mutation types on the transcriptional level through the table LOC_transcription:

knitr::kable(LOC_transcription)
Variant_Classification SNP INS DEL DNP TNP ONP
5’Flank 1 1 1 1 1 1
5’UTR 1 1 1 1 1 1
Translation_Start_Site 0 0 0 0 0 0
Missense_Mutation 0 NA NA 0 0 0
Nonsense_Mutation 1 1 1 1 1 1
Nonstop_Mutation 0 0 0 0 0 0
Splice_Site 1 1 1 1 1 1
Splice_Region 1 1 1 1 1 1
Silent 0 NA NA 0 0 0
In_Frame_Del NA 0 0 NA NA NA
In_Frame_Ins NA 0 0 NA NA NA
Frame_Shift_Del NA 0 0 NA NA NA
Frame_Shift_Ins NA 0 0 NA NA NA
Intron 1 1 1 1 1 1
3’UTR 1 1 1 1 1 1
3’Flank 1 1 1 1 1 1
RNA 1 1 1 1 1 1
IGR 1 1 1 1 1 1

The user can investigate the predicted effect of different mutation types on the translational level through the table LOC_translation:

knitr::kable(LOC_translation)
Variant_Classification SNP INS DEL DNP TNP ONP
5’Flank 1 1 1 1 1 1
5’UTR 1 1 1 1 1 1
Translation_Start_Site 1 1 1 1 1 1
Missense_Mutation 0 NA NA 0 0 0
Nonsense_Mutation 1 1 1 1 1 1
Nonstop_Mutation 1 1 1 1 1 1
Splice_Site 1 1 1 1 1 1
Splice_Region 1 1 1 1 1 1
Silent 1 NA NA 1 1 1
In_Frame_Del NA 0 0 NA NA NA
In_Frame_Ins NA 0 0 NA NA NA
Frame_Shift_Del NA 0 0 NA NA NA
Frame_shift_Ins NA 0 0 NA NA NA
Intron 1 1 1 1 1 1
3’UTR 1 1 1 1 1 1
3’Flank 1 1 1 1 1 1
IGR 1 1 1 1 1 1
RNA 1 1 1 1 1 1

The user can investigate the predicted effect of different mutation types on the protein level through the table LOC_protein:

knitr::kable(LOC_protein)
Variant_Classification SNP INS DEL DNP TNP ONP
5’Flank 0 0 0 0 0 0
5’UTR 0 0 0 0 0 0
Translation_Start_Site 1 1 1 1 1 1
Missense_Mutation 1 NA NA 1 1 1
Nonsense_Mutation 1 1 1 1 1 1
Nonstop_Mutation 1 1 1 1 1 1
Splice_Site 1 1 1 1 1 1
Splice_Region 1 1 1 1 1 1
Silent 0 NA NA 0 0 0
In_Frame_Del NA 1 1 NA NA NA
In_Frame_Ins NA 1 1 NA NA NA
Frame_Shift_Del NA 1 1 NA NA NA
Frame_Shift_Ins NA 1 1 NA NA NA
Intron 1 1 1 1 1 1
3’UTR 0 0 0 0 0 0
3’Flank 0 0 0 0 0 0
RNA 0 0 0 0 0 0
IGR 0 0 0 0 0 0

plotNetworkHive: GRN hive visualization taking into account COSMIC cancer genes

In the following plot the nodes are separated into three groups: known tumor suppressor genes (yellow), known oncogenes (green) and the rest (gray).

data(knownDriverGenes)
data(dataGRN)
plotNetworkHive(dataGRN, knownDriverGenes, 0.55)

plotDMA: Heatmap of the driver/passenger status of mutations in TSGs/OCGs

In the following plot the driver genes with driver mutations are shown.

data(dataDMA)
data(DEG_Mutations_Annotations)
data(Oncogenic_mediators_mutation_summary)
plotDMA(DEG_Mutations_Annotations, 
        Oncogenic_mediators_mutation_summary, 
        type = 'complete', additionalFilename = "")

plotMoonlight: Heatmap of Moonlight Gene Z-scores for mutation-driven TSGs/OCGs

In the following plot the top 50 genes with the most driver mutations are visualised. The values are the moonlight gene z-score for the two biological processes

data(DEG_Mutations_Annotations)
data(Oncogenic_mediators_mutation_summary)
data(dataURA_plot)
plotMoonlight(DEG_Mutations_Annotations,
        Oncogenic_mediators_mutation_summary, 
        dataURA_plot, gene_type = "drivers", n = 50)

plotGMA: Heatmap of hypo/hyper/dual methylated CpG sites in TSGs/OCGs

This function plots results of the GMA. It visualizes the number of hypo/hyper/dual methylated CpG sites in oncogenic mediators or in a user-supplied gene list. The results are visualized either in a single heatmap or split into different ones which is specified in the function’s three modes: split, complete and genelist.

data("DEG_Methylation_Annotations")
data("Oncogenic_mediators_methylation_summary")
genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1")
plotGMA(DEG_Methylation_Annotations = DEG_Methylation_Annotations,
        Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary,
        type = "genelist", genelist = genes,
        additionalFilename = "./GMAresults/")

plotMoonlightMet: Heatmap of Moonlight Gene Z-scores for methylation-driven TSGs/OCGs

This function visualizes the effect of genes on biological processes and total number of hypo/hyper/dual methylated CpG sites in genes.

data("DEG_Methylation_Annotations")
data("Oncogenic_mediators_methylation_summary")
data("dataURA_plot")
genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1")
plotMoonlightMet(DEG_Methylation_Annotations = DEG_Methylation_Annotations,
                 Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary,
                 dataURA = dataURA_plot,
                 genes = genes,
                 additionalFilename = "./GMAresults/")

plotMetExp: Visualize results from EpiMix of expression and methylation in genes

This function visualizes results of EpiMix (Zheng, Yuanning and Jun, John and Brennan, Kevin and Gevaert, Olivier 2023) in three plots: one that visualizes the distribution of DNA methylation data (MixtureModelPlot), one that visualizes gene expression levels (ViolinPlot) and one that visualizes relationship between DNA methylation and gene expression (CorrelationPlot).

data("EpiMix_Results_Regular")
data("dataMethyl")
data("dataFilt")

# Subset column names (sample names) in expression data to patient level
pattern <- "^(.{4}-.{2}-.{4}-.{2}).*"
colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt))

plotMetExp(EpiMix_results = EpiMix_Results_Regular,
           probe_name = "cg03035704",
           dataMET = dataMethyl,
           dataEXP = dataFilt,
           gene_of_interest = "ACVRL1",
           additionalFilename = "./GMAresults/")
## p value = 0.009484347 R-squared =  0.641

Moonlight Analysis: Case Studies

Introduction

This vignette shows a complete workflow of the ‘Moonlight2R’ package. The code is divided into the following case studies:

    1. Predicting oncogenic mediators using Moonlight’s primary layer
    1. Moonlight pipeline in one function
    1. Moonlight with driver mutation analysis
    1. Moonlight with gene methylation analysis

Case study n. 1: Predicting oncogenic mediators using Moonlight’s primary layer

For illustrative purposes and to decrease runtime, we have set nGenesPerm = 5 and nBoot = 5 in the call of GRN in the following code block, however, we recommend setting these parameters to nGenesPerm = 2000 and nBoot = 400 to achieve optimal results, as they are set by default in the function arguments.

data(DEGsmatrix)
data(dataFilt)
data(DiseaseList)
data(EAGenes)
data(tabGrowBlock)
data(knownDriverGenes)

dataFEA <- FEA(DEGsmatrix = DEGsmatrix)

dataGRN <- GRN(TFs = sample(rownames(DEGsmatrix), 100), 
               DEGsmatrix = DEGsmatrix,
               DiffGenes = TRUE,
               normCounts = dataFilt,
               nGenesPerm = 5,
               nBoot = 5,
           kNearest = 3)

dataURA <- URA(dataGRN = dataGRN, 
              DEGsmatrix = DEGsmatrix, 
              BPname = c("apoptosis",
                         "proliferation of cells"))

dataDual <- PRA(dataURA = dataURA, 
               BPname = c("apoptosis",
                          "proliferation of cells"),
               thres.role = 0)

oncogenic_mediators <- list("TSG"=names(dataDual$TSG), "OCG"=names(dataDual$OCG))

plotURA: Upstream regulatory analysis plot

The user can plot the result of the upstream regulatory analysis using the function plotURA.

data(dataURA) 
plotURA(dataURA = dataURA, additionalFilename = "_exampleVignette")

The figure resulted from the code above is shown below:

Case study n. 2: Moonlight pipeline in one function

For illustrative purposes and to decrease runtime, we have set nGenesPerm = 5 and nBoot = 5 in the following code block, however, we recommend setting these parameters to nGenesPerm = 2000 and nBoot = 400 to achieve optimal results, as they are set by default in the function arguments.

data(dataFilt)
data(DEGsmatrix)
data(dataMAF)
data(DiseaseList)
data(EAGenes)
data(tabGrowBlock)
data(knownDriverGenes)
data(LOC_transcription)
data(LOC_translation)
data(LOC_protein)
data(NCG)
data(EncodePromoters)

listMoonlight <- moonlight(dataDEGs = DEGsmatrix,
                           dataFilt = dataFilt,
                           nTF = 100,
                           DiffGenes = TRUE,
               nGenesPerm = 5,
                           nBoot = 5,
                           BPname = c("apoptosis","proliferation of cells"),
                           dataMAF = dataMAF,
                           path_cscape_coding = "css_coding.vcf.gz", 
                           path_cscape_noncoding = "css_noncoding.vcf.gz")
save(listMoonlight, file = paste0("listMoonlight_ncancer4.Rdata"))

plotCircos: Moonlight Circos Plot

An example of running Moonlight on five cancer types is visualized below in a circos plot. Outer ring: color by cancer type, Inner ring: OCGs and TSGs, Inner connections: green: common OCGs yellow: common TSGs red: possible dual role

data(listMoonlight)
plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")

The figure generated by the code above is shown below:

Case study n. 3: Moonlight with driver mutation analysis

For illustrative purposes and to decrease runtime, we have set nGenesPerm = 5 and nBoot = 5 in the call of GRN in the following code block, however, we recommend setting these parameters to nGenesPerm = 2000 and nBoot = 400 to achieve optimal results, as they are set by default in the function arguments.

data(DEGsmatrix)
data(dataFilt)
data(dataMAF)
data(DiseaseList)
data(EAGenes)
data(tabGrowBlock)
data(knownDriverGenes)
data(LOC_transcription)
data(LOC_translation)
data(LOC_protein)
data(NCG)
data(EncodePromoters)

# Perform gene regulatory network analysis
dataGRN <- GRN(TFs = rownames(DEGsmatrix), 
           DEGsmatrix = DEGsmatrix,
           DiffGenes = TRUE,
               normCounts = dataFilt,
               nGenesPerm = 5, 
               kNearest = 3, 
               nBoot = 5)

# Perform upstream regulatory analysis
# As example, we use apoptosis and proliferation of cells as the biological processes
dataURA <- URA(dataGRN = dataGRN,
               DEGsmatrix = DEGsmatrix,
               BPname = c("apoptosis", 
                          "proliferation of cells"), 
               nCores = 1)

# Perform pattern recognition analysis
dataPRA <- PRA(dataURA = dataURA,
               BPname = c("apoptosis", 
                          "proliferation of cells"),
               thres.role = 0)

# Perform driver mutation analysis
dataDMA <- DMA(dataMAF = dataMAF,
               dataDEGs = DEGsmatrix, 
               dataPRA = dataPRA,
               results_folder = "DMAresults",
               coding_file = "css_coding.vcf.gz",
               noncoding_file = "css_noncoding.vcf.gz")

Next, we analyze the predicted driver genes and their mutations.

data(Oncogenic_mediators_mutation_summary)
data(DEG_Mutations_Annotations)

# Extract oncogenic mediators that contain at least one driver mutation
# These are the driver genes
knitr::kable(Oncogenic_mediators_mutation_summary %>%
  filter(!is.na(CScape_Driver)))
Hugo_Symbol Moonlight_Oncogenic_Mediator CScape_Passenger CScape_Driver CScape_Unclassified Transcription_mut_sum Translation_mut_sum Protein_mut_sum Total_Mutations NCG_driver NCG_cgc_annotation NCG_vogelstein_annotation NCG_saito_annotation NCG_pubmed_id NCG_cancer_type
ABCG2 OCG NA 1 NA 0 0 1 1 Candidate NA NA NA 30718927 esophageal_adenocarcinoma
# Extract mutation annotations of the predicted driver genes
driver_mut <- DEG_Mutations_Annotations %>% 
  filter(!is.na(Moonlight_Oncogenic_Mediator), 
         CScape_Mut_Class == "Driver")

# Extract driver genes with a predicted effect on the transcriptional level
transcription_mut <- Oncogenic_mediators_mutation_summary %>%
  filter(!is.na(CScape_Driver)) %>%
  filter(Transcription_mut_sum > 0)

# Extract mutation annotations of predicted driver genes that have a driver mutation
# in its promoter region with a potential effect on the transcriptional level 
promoters <- DEG_Mutations_Annotations %>% 
  filter(!is.na(Moonlight_Oncogenic_Mediator), 
         CScape_Mut_Class == "Driver",
         Potential_Effect_on_Transcription == 1,
         Annotation == 'Promoter')

Case study n. 4: Moonlight with gene methylation analysis

For illustrative purposes and to decrease runtime, we have set nGenesPerm = 5 and nBoot = 5 in the call of GRN in the following code block, however, we recommend setting these parameters to nGenesPerm = 2000 and nBoot = 400 to achieve optimal results, as they are set by default in the function arguments.

data(DEGsmatrix)
data(dataFilt)
data(dataMAF)
data(DiseaseList)
data(EAGenes)
data(tabGrowBlock)
data(knownDriverGenes)
data(LOC_transcription)
data(LOC_translation)
data(LOC_protein)
data(NCG)
data(EncodePromoters)
data(dataMethyl)
data(LUAD_sample_anno)
data(MetEvidenceDriver)

# Perform gene regulatory network analysis
dataGRN <- GRN(TFs = rownames(DEGsmatrix), 
               DEGsmatrix = DEGsmatrix,
               DiffGenes = TRUE,
               normCounts = dataFilt,
               nGenesPerm = 5, 
               kNearest = 3, 
               nBoot = 5)

# Perform upstream regulatory analysis
# As example, we use apoptosis and proliferation of cells as the biological processes
dataURA <- URA(dataGRN = dataGRN,
               DEGsmatrix = DEGsmatrix,
               BPname = c("apoptosis", 
                          "proliferation of cells"), 
               nCores = 1)

# Perform pattern recognition analysis
dataPRA <- PRA(dataURA = dataURA,
               BPname = c("apoptosis", 
                          "proliferation of cells"),
               thres.role = 0)

# Subset column names (sample names) in expression data to patient level
pattern <- "^(.{4}-.{2}-.{4}-.{2}).*"
colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt))

# Perform gene methylation analysis
dataGMA <- GMA(dataMET = dataMethyl, dataEXP = dataFilt, 
               dataPRA = dataPRA, dataDEGs = DEGsmatrix, 
               sample_info = LUAD_sample_anno, met_platform = "HM450",
               prevalence_filter = NULL,
               output_dir = "./GMAresults", cores = 1, 
               roadmap.epigenome.ids = "E096", 
               roadmap.epigenome.groups = NULL)

Citation

Please cite the MoonlightR and Moonlight2R packages:

  • “Interpreting pathways to discover cancer driver genes with Moonlight.” Nature Communications (2020): 10.1038/s41467-019-13803-0. (Colaprico, Antonio and Olsen, Catharina and Bailey, Matthew H. and Odom, Gabriel J. and Terkelsen, Thilde and Silva, Tiago C. and Olsen, André V. and Cantini, Laura and Zinovyev, Andrei and Barillot, Emmanuel and Noushmehr, Houtan and Bertoli, Gloria and Castiglioni, Isabella and Cava, Claudia and Bontempi, Gianluca and Chen, Xi Steven and Papaleo, Elena 2020)

  • “A workflow to study mechanistic indicators for driver gene prediction with Moonlight.” Briefings in Bioinformatics (2023): 10.1093/bib/bbad274. (Nourbakhsh, Mona and Saksager, Astrid and Tom, Nikola and Chen, Xi Steven and Colaprico, Antonio and Olsen, Catharina and Tiberti, Matteo and Papaleo, Elena 2023)

Related publications to this vignette:

  • “TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.” Nucleic acids research (2015): gkv1507. (Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan 2016)

  • “TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages”. F1000Research 10.12688/f1000research.8923.1 (Silva, TC and Colaprico, A and Olsen, C and D’Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H 2016)


Session Information ******

sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] GenomicRanges_1.57.0 GenomeInfoDb_1.41.0  IRanges_2.39.0      
##  [4] S4Vectors_0.43.0     sesameData_1.21.10   ExperimentHub_2.13.0
##  [7] AnnotationHub_3.13.0 BiocFileCache_2.13.0 dbplyr_2.5.0        
## [10] BiocGenerics_0.51.0  dplyr_1.1.4          magrittr_2.0.3      
## [13] Moonlight2R_1.3.0    doParallel_1.0.17    iterators_1.0.14    
## [16] foreach_1.5.2        BiocStyle_2.33.0    
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.4                    matrixStats_1.3.0          
##   [3] bitops_1.0-7                HiveR_0.3.63               
##   [5] enrichplot_1.25.0           HDO.db_0.99.1              
##   [7] httr_1.4.7                  RColorBrewer_1.1-3         
##   [9] tools_4.4.0                 utf8_1.2.4                 
##  [11] R6_2.5.1                    mgcv_1.9-1                 
##  [13] lazyeval_0.2.2              GetoptLong_1.0.5           
##  [15] withr_3.0.0                 prettyunits_1.2.0          
##  [17] gridExtra_2.3               RISmed_2.3.0               
##  [19] textshaping_0.3.7           cli_3.6.2                  
##  [21] Biobase_2.65.0              Cairo_1.6-2                
##  [23] scatterpie_0.2.2            labeling_0.4.3             
##  [25] sass_0.4.9                  readr_2.1.5                
##  [27] randomForest_4.7-1.1        askpass_1.2.0              
##  [29] EpiMix_1.7.1                systemfonts_1.0.6          
##  [31] Rsamtools_2.21.0            yulab.utils_0.1.4          
##  [33] gson_0.1.0                  R.utils_2.12.3             
##  [35] DOSE_3.31.0                 limma_3.61.0               
##  [37] RSQLite_2.3.6               generics_0.1.3             
##  [39] gridGraphics_0.5-1          shape_1.4.6.1              
##  [41] BiocIO_1.15.0               gtools_3.9.5               
##  [43] dendextend_1.17.1           GO.db_3.19.1               
##  [45] Matrix_1.7-0                fansi_1.0.6                
##  [47] abind_1.4-5                 R.methodsS3_1.8.2          
##  [49] lifecycle_1.0.4             yaml_2.3.8                 
##  [51] SummarizedExperiment_1.35.0 gplots_3.1.3.1             
##  [53] qvalue_2.37.0               SparseArray_1.5.0          
##  [55] grid_4.4.0                  blob_1.2.4                 
##  [57] crayon_1.5.2                lattice_0.22-6             
##  [59] cowplot_1.1.3               GenomicFeatures_1.57.0     
##  [61] KEGGREST_1.45.0             magick_2.8.3               
##  [63] pillar_1.9.0                knitr_1.46                 
##  [65] ComplexHeatmap_2.21.0       fgsea_1.31.0               
##  [67] tcltk_4.4.0                 rjson_0.2.21               
##  [69] codetools_0.2-20            fastmatch_1.1-4            
##  [71] glue_1.7.0                  downloader_0.4             
##  [73] ggfun_0.1.4                 qpdf_1.3.3                 
##  [75] data.table_1.15.4           parmigene_1.1.0            
##  [77] vctrs_0.6.5                 png_0.1-8                  
##  [79] treeio_1.29.0               gtable_0.3.5               
##  [81] cachem_1.0.8                EpiMix.data_1.5.0          
##  [83] xfun_0.43                   mime_0.12                  
##  [85] S4Arrays_1.5.0              tidygraph_1.3.1            
##  [87] tinytex_0.50                statmod_1.5.0              
##  [89] rgl_1.3.1                   nlme_3.1-164               
##  [91] ggtree_3.13.0               bit64_4.0.5                
##  [93] progress_1.2.3              filelock_1.0.3             
##  [95] bslib_0.7.0                 KernSmooth_2.23-22         
##  [97] colorspace_2.1-0            DBI_1.2.2                  
##  [99] tidyselect_1.2.1            bit_4.0.5                  
## [101] compiler_4.4.0              extrafontdb_1.0            
## [103] curl_5.2.1                  tidyHeatmap_1.8.1          
## [105] httr2_1.0.1                 xml2_1.3.6                 
## [107] RPMM_1.25                   DelayedArray_0.31.0        
## [109] bookdown_0.39               shadowtext_0.1.3           
## [111] rtracklayer_1.65.0          scales_1.3.0               
## [113] caTools_1.18.2              fuzzyjoin_0.1.6            
## [115] rappdirs_0.3.3              stringr_1.5.1              
## [117] digest_0.6.35               rmarkdown_2.26             
## [119] GEOquery_2.73.0             XVector_0.45.0             
## [121] htmltools_0.5.8.1           pkgconfig_2.0.3            
## [123] jpeg_0.1-10                 base64enc_0.1-3            
## [125] extrafont_0.19              MatrixGenerics_1.17.0      
## [127] highr_0.10                  fastmap_1.1.1              
## [129] rlang_1.1.3                 GlobalOptions_0.1.2        
## [131] htmlwidgets_1.6.4           UCSC.utils_1.1.0           
## [133] farver_2.1.1                jquerylib_0.1.4            
## [135] jsonlite_1.8.8              BiocParallel_1.39.0        
## [137] R.oo_1.26.0                 GOSemSim_2.31.0            
## [139] RCurl_1.98-1.14             GenomeInfoDbData_1.2.12    
## [141] ggplotify_0.1.2             patchwork_1.2.0            
## [143] munsell_0.5.1               Rcpp_1.0.12                
## [145] ape_5.8                     viridis_0.6.5              
## [147] stringi_1.8.3               ggraph_2.2.1               
## [149] zlibbioc_1.51.0             MASS_7.3-60.2              
## [151] plyr_1.8.9                  org.Hs.eg.db_3.19.1        
## [153] ggrepel_0.9.5               doSNOW_1.0.20              
## [155] Biostrings_2.73.0           graphlayouts_1.1.1         
## [157] splines_4.4.0               hms_1.1.3                  
## [159] circlize_0.4.16             seqminer_9.4               
## [161] igraph_2.0.3                reshape2_1.4.4             
## [163] biomaRt_2.61.0              BiocVersion_3.20.0         
## [165] XML_3.99-0.16.1             evaluate_0.23              
## [167] BiocManager_1.30.22         ELMER.data_2.27.0          
## [169] tzdb_0.4.0                  tweenr_2.0.3               
## [171] Rttf2pt1_1.3.12             tidyr_1.3.1                
## [173] purrr_1.0.2                 polyclip_1.10-6            
## [175] clue_0.3-65                 ggplot2_3.5.1              
## [177] ggforce_0.4.2               restfulr_0.0.15            
## [179] easyPubMed_2.13             tidytree_0.4.6             
## [181] ragg_1.3.0                  viridisLite_0.4.2          
## [183] snow_0.4-4                  tibble_3.2.1               
## [185] clusterProfiler_4.13.0      aplot_0.2.2                
## [187] memoise_2.0.1               AnnotationDbi_1.67.0       
## [189] GenomicAlignments_1.41.0    cluster_2.1.6

References

Colaprico, Antonio and Olsen, Catharina and Bailey, Matthew H. and Odom, Gabriel J. and Terkelsen, Thilde and Silva, Tiago C. and Olsen, André V. and Cantini, Laura and Zinovyev, Andrei and Barillot, Emmanuel and Noushmehr, Houtan and Bertoli, Gloria and Castiglioni, Isabella and Cava, Claudia and Bontempi, Gianluca and Chen, Xi Steven and Papaleo, Elena. 2020. “Interpreting Pathways to Discover Cancer Driver Genes with Moonlight.” https://doi.org/10.1038/s41467-019-13803-0.

Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan. 2016. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of TCGA Data.” https://doi.org/10.1093/nar/gkv1507.

Nourbakhsh, Mona and Saksager, Astrid and Tom, Nikola and Chen, Xi Steven and Colaprico, Antonio and Olsen, Catharina and Tiberti, Matteo and Papaleo, Elena. 2023. “A Workflow to Study Mechanistic Indicators for Driver Gene Prediction with Moonlight.” https://doi.org/10.1093/bib/bbad274.

Rogers, Mark F and Gaunt, Tom R and Campbell, Colin. 2020. “CScape-Somatic: Distinguishing Driver and Passenger Point Mutations in the Cancer Genome.” https://doi.org/10.1093/bioinformatics/btaa242.

Silva, TC and Colaprico, A and Olsen, C and D’Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H. 2016. “TCGA Workflow: Analyze Cancer Genomics and Epigenomics Data Using Bioconductor Packages.” https://doi.org/10.12688/f1000research.8923.1.

Zheng, Yuanning and Jun, John and Brennan, Kevin and Gevaert, Olivier. 2023. “EpiMix Is an Integrative Tool for Epigenomic Subtyping Using DNA Methylation.” https://doi.org/10.1016/j.crmeth.2023.100515.