Another Bioconductor package we will also be introduce is called ELMER (L. Yao et al. 2015,Chedraoui Silva et al. (2017)) which allows one to identify DNA methylation changes in distal regulatory regions and correlate these signatures with expression of nearby genes to identify transcriptional targets associated with cancer. For these distal probes correlated with a gene, a transcription factor motif analysis is performed followed by expression analysis of transcription factors to infer upstream regulators.
We expect that participants of this workshop will understand the integrative analysis performed by using TCGAbiolinks + ELMER, as well as be able to execute it from the data acquisition process to the final interpretation of the results. The workshop assumes users with an intermediate level of familiarity with R, and basic understanding of tumor biology.
The figure below highlights the workflow part which will be covered in this section.
library(ELMER)
library(DT)
library(dplyr)
A Multi Assay Experiment object from the MultiAssayExperiment package is the input for multiple main functions of ELMER. To create it you can use the createMAE
function.
But instead of using the data previously downloaded, we will use one with more samples provided in this package.
library(MultiAssayExperiment)
lusc.exp
## class: RangedSummarizedExperiment
## dim: 3742 234
## metadata(0):
## assays(1): ''
## rownames(3742): ENSG00000188984 ENSG00000204518 ...
## ENSG00000162378 ENSG00000036549
## rowData names(2): external_gene_name ensembl_gene_id
## colnames(234): TCGA-22-5472-01A-01R-1635-07
## TCGA-22-5489-01A-01R-1635-07 ... TCGA-56-7730-01A-11R-2125-07
## TCGA-56-7731-01A-11R-2125-07
## colData names(0):
lusc.met
## class: RangedSummarizedExperiment
## dim: 1702 268
## metadata(0):
## assays(1): ''
## rownames(1702): cg00045114 cg00050294 ... cg13985485 cg08542090
## rowData names(36): addressA addressB ... MASK.extBase MASK.general
## colnames(268): TCGA-43-3394-11A-01D-1551-05
## TCGA-43-3920-11B-01D-1551-05 ... TCGA-98-8022-01A-11D-2245-05
## TCGA-98-8023-01A-11D-2245-05
## colData names(1): samples
Also, we will need to filter the probes in the MAE to select only distal probes.
distal.probes <- get.feature.probe(genome = "hg38", met.platform = "450K")
The createMAE
function will receive both DNA methylation and gene expression objects. Also filter.probes
argument will select only probes in the regions of the distal.probes object.
library(MultiAssayExperiment)
mae <- createMAE(exp = lusc.exp,
met = lusc.met,
save = TRUE,
linearize.exp = TRUE,
filter.probes = distal.probes,
save.filename = "mae_bioc2017.rda",
met.platform = "450K",
genome = "hg38",
TCGA = TRUE)
mae
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] DNA methylation: RangedSummarizedExperiment with 1584 rows and 234 columns
## [2] Gene expression: RangedSummarizedExperiment with 3742 rows and 234 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
colData(mae)[1:5,] %>% as.data.frame %>% datatable(options = list(scrollX = TRUE))
sampleMap(mae)[1:5,] %>% as.data.frame %>% datatable(options = list(scrollX = TRUE))
This step is to identify DNA methylation changes at distal enhancer probes which is carried out by function get.diff.meth
.
For each distal probe, the function first rank samples from group 1 and group 2 samples by their DNA methylation beta values. To identify hypomethylated probes, the function compared the lower control quintile (20% of control samples with the lowest methylation) to the lower experiment quintile (20% of experiment samples with the lowest methylation), using an unpaired one-tailed t-test.
Source: Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105.
Argument | Description |
---|---|
data | A multiAssayExperiment with DNA methylation and Gene Expression data. See createMAE function. |
diff.dir | A character can be “hypo” or “hyper”, showing differential methylation dirction. It can be “hypo” which is only selecting hypomethylated probes; “hyper” which is only selecting hypermethylated probes |
minSubgroupFrac | A number ranging from 0 to 1,specifying the fraction of extreme samples from group 1 and group 2 that are used to identify the differential DNA methylation. The default is 0.2 because we typically want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use a value of 1.0 (Supervised mode) |
pvalue | A number specifies the significant P value (adjusted P value by BH) cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.01 |
group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)) . |
group1 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
group2 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
sig.dif | A number specifies the smallest DNA methylation difference as a cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.3. |
sig.diff <- get.diff.meth(data = mae,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
minSubgroupFrac = 0.2,
sig.dif = 0.3,
diff.dir = "hypo", # Search for hypomethylated probes in group 1
cores = 1,
dir.out ="result",
pvalue = 0.01)
as.data.frame(sig.diff) %>% datatable(options = list(scrollX = TRUE))
# get.diff.meth automatically save output files.
# getMethdiff.hypo.probes.csv contains statistics for all the probes.
# getMethdiff.hypo.probes.significant.csv contains only the significant probes which
# is the same with sig.diff
dir(path = "result", pattern = "getMethdiff")
## [1] "getMethdiff.hypo.probes.csv"
## [2] "getMethdiff.hypo.probes.significant.csv"
This step is to link enhancer probes with methylation changes to target genes with expression changes and report the putative target gene for selected probes. This is carried out by function get.pair
.
For each distal probe with differential methylation the closest 10 upstream genes and the closest 10 downstream genes were tested for correlation between methylation of the probe and expression of the gene. Thus, exactly 20 statistical tests were performed for each probe, as follows. For each probe-gene pair, the samples (all experiment samples and control samples) were divided into two groups: the M group, which consisted of the upper methylation quintile (the 20% of samples with the highest methylation at the enhancer probe), and the U group, which consisted of the lowest methylation quintile (the 20% of samples with the lowest methylation.) For each probe-gene pair tested, the raw p-value Pr
was corrected for multiple hypothesis using a permutation approach.