1 Introduction

Illumina Infinium HumanMethylation 450K BeadChip assay has become a standard tool to analyse methylation in human samples. Developed in 2011, it has already been used in projects such as The Cancer Genome Atlas (TCGA). Their 450.000 probes provide a good overall image of the methylation state of the genome, being one of the reasons of its success.

Given its complex design1 More information can be found at this minfi tutorial, many Bioconductor packages have been developed to assess normalization and pre-processing issues (e.g. minfi (Aryee et al. 2014) or lumi (Du, Kibbe, and Lin 2008)). In addition, these packages can detect differentially methylated probes (DMPs) and differentially methylated regions (DMRs). However, the interfaces are not very intuitive and several scripting steps are usually required.

MEAL aims to facilitate the analysis of Illumina Methylation 450K chips. We have included two methods to analyze DMPs (Differentially Methylated Probes), that test differences in means (limma) or differences in variance (DiffVar). We have included three DMRs (Differentially Methylated Regions) detection algorithms (bumphunter, blockFinder and DMRcate) and a new method to test differences in methylation in a target region (RDA). Finally, we have prepared plots for all these analyses as well as a wrapper to run all the analyses in the same dataset.

2 Input data

MEAL is meant to analyze methylation data already preprocessed. All our functions accept a GenomicRatioSet as input, which is a class from minfi package designed to manage preprocessed methylation data. Users willing to preprocess their own data are encouraged to take a look to minfi’s vignette

In this vignette, we will use methylation data from minfiData package.

library(MEAL)
library(MultiDataSet)
library(minfiData)
library(minfi)
library(ggplot2)

data("MsetEx")

MsetEx is a MethylationRatioSet that contains measurements for 485512 CpGs and 6 samples, as well as some phenotypic variables such as age or sex. The first step will be to convert it to a GenomicRatioSet. Then, we will add some extra features annotation. Finally, we will remove any CpG with NAs:

meth <- mapToGenome(ratioConvert(MsetEx))
rowData(meth) <- getAnnotation(meth)[, -c(1:3)]
meth <- meth[!apply(getBeta(meth), 1, function(x) any(is.na(x))), ]

3 Analyzing Methylation data

3.1 Pipeline

The function runPipeline run all methods included in MEAL to the same dataset. We only need to pass to this function a GenomicRatioSet and the name of our variable of interest. In our case, we will analyze the effect of cancer on methylation:

res <- runPipeline(set = meth, variable_names = "status")
res
## Object of class 'ResultSet'
##  . created with: runPipeline 
##  . sva:  no 
##  . #results: 5 ( error: 0 )
##  . featureData: 485500 probes x 35 variables

runPipeline includes several parameters to customize the analyses. The most important parameters are covariable_names, betas and sva. covariable_names is used to include covariates in our models. betas allows the user choosing between running the analyis with beta (TRUE) or M-values (FALSE). If sva is TRUE, Surrogate Variable Analysis is run and surrogate variables are included in the models. Finally, some parameters modify the behaviour of the methods included in the wrapper and they will be covered later on. More information about the parameters can be found in the documentation (by typing ?runPipeline).

We will run a new analysis including age as covariate:

resAdj <- runPipeline(set = meth, variable_names = "status", 
                      covariable_names = "age")
resAdj
## Object of class 'ResultSet'
##  . created with: runPipeline 
##  . sva:  no 
##  . #results: 5 ( error: 0 )
##  . featureData: 485500 probes x 35 variables

3.2 Managing the results

runPipeline generates a ResultSet object. ResultSet is a class designed to encapsulate different results from the same dataset. It contains the results of the different methods, the feature data and other data required to get tables or plots. We can examine the analyses included in a ResultSet with the function names:

names(res)
## [1] "DiffMean"    "DiffVar"     "bumphunter"  "blockFinder" "dmrcate"
names(resAdj)
## [1] "DiffMean"    "DiffVar"     "bumphunter"  "blockFinder" "dmrcate"

Both objects contains five analyses. DiffMean is an analysis of difference of means performed with limma while the others are named with the method name (DiffVar, bumphunter, blockFinder and dmrcate).

We can use the function getAssociation to get a data.frame with the results, independent of the original method. This function has two main arguments: object and rid. object is the ResultSet with our data and rid is the name or the index of the analysis we want to extract.

head(getAssociation(res, "DiffMean"))
##                 logFC   AveExpr         t      P.Value   adj.P.Val        B
## cg26729197 -0.5779628 0.3630791 -40.43027 8.658132e-08 0.005469413 7.353111
## cg26684946 -0.5216140 0.3901095 -39.79660 9.409595e-08 0.005469413 7.321443
## cg16485975 -0.5446556 0.4540058 -39.53810 9.738268e-08 0.005469413 7.308181
## cg00128718  0.5010168 0.5548436  38.60684 1.104112e-07 0.005469413 7.258685
## cg25937714 -0.5906359 0.4350042 -37.25410 1.332292e-07 0.005469413 7.181670
## cg27651090  0.5433331 0.5411453  37.22607 1.337583e-07 0.005469413 7.180006
head(getAssociation(res, "DiffVar"))
##                logFC   AveExpr         t      P.Value adj.P.Val         B
## cg26681210  2.393538 1.2560773  5.264503 0.0009070454 0.1806555 -1.062042
## cg02622316 -2.104377 1.0915224 -5.253659 0.0009185444 0.1806555 -1.068778
## cg15092787 -1.930107 0.9874307 -5.248894 0.0009236480 0.1806555 -1.071745
## cg19977011 -4.938601 2.6645708 -5.241326 0.0009318187 0.1806555 -1.076464
## cg07211212 -2.032060 1.0652710 -5.227776 0.0009466462 0.1806555 -1.084936
## cg23165310 -1.739898 0.8932740 -5.215436 0.0009603769 0.1806555 -1.092678
head(getAssociation(res, "bumphunter"))
##         chr     start       end      value     area cluster indexStart
## 89924  chr6 133561647 133562776 -0.4128635 15.68881  169280     180989
## 76143 chr10 118030848 118034357 -0.4244696 12.73409   28532     267188
## 80548 chr16  51183988  51187807 -0.3357711 11.75199   76311     380944
## 89156  chr6  29520698  29521803 -0.3009453 11.73687  163137     158788
## 78728 chr13  78492568  78493590 -0.4345396 11.73257   55851     333318
## 85384 chr20  61050560  61051915 -0.4558366 11.39591  122286     459589
##       indexEnd  L clusterL
## 89924   181026 38       43
## 76143   267217 30       30
## 80548   380978 35       42
## 89156   158826 39       40
## 78728   333344 27       50
## 85384   459613 25       26
head(getAssociation(res, "blockFinder"))
##        chr     start       end     value     area cluster indexStart
## 431   chr2 217468708 219096237 0.1692934 29.96494      93      36345
## 1255  chr7  40098119  42446212 0.1706023 20.47228     331      91718
## 118   chr1 152056869 153234423 0.1674880 19.93108      32      13744
## 519   chr3  54169983  56448270 0.1621731 19.29860     105      43843
## 1834 chr11  55066025  56956400 0.1703365 17.71500     507     133354
## 1900 chr11 131278641 132757156 0.1475531 17.70638     520     141047
##      indexEnd   L clusterL
## 431     36561 177      527
## 1255    91861 120     1845
## 118     13891 119     1646
## 519     43985 119     1761
## 1834   133468 104     1803
## 1900   141176 120     1754
head(getAssociation(res, "dmrcate"))
##                          coord no.cpgs        minfdr     Stouffer  maxbetafc
## 4146    chr6:33130696-33148812     145  6.894305e-98 1.961070e-51  0.4429870
## 1846   chr16:51183363-51190201      48 1.017447e-189 4.931163e-34 -0.5757717
## 4342  chr6:133561224-133564578      53  0.000000e+00 2.504360e-33 -0.6600767
## 524  chr10:118030292-118034357      31  0.000000e+00 1.329099e-32 -0.6980408
## 4054    chr6:29520527-29521803      40 1.617159e-225 4.924774e-32 -0.5406367
## 1852   chr16:54964332-54974287      46 1.966906e-117 7.660045e-31 -0.6023573
##      meanbetafc
## 4146  0.2063092
## 1846 -0.3008899
## 4342 -0.3541019
## 524  -0.4162689
## 4054 -0.2956325
## 1852 -0.2373021

DiffMean and DiffVar are internally stored as a MArrayLM, the class from limma results. This class allows testing different constrasts or evaluating different variables simultaneously. The function getProbeResults helps the user performing these operations. It also has the arguments object and rid from getAssociation. coef is a numeric with the index of the coefficient from which we want the results. If we did not pass a custom model to runPipeline, the first coefficient (coef = 1) is the intercept and the second coefficient (coef = 2) is the first variable that we included in variable_names. We can evaluate different coefficients simultaneously by passing a vector to coef. contrast is a matrix with the contrasts that we want to evaluate. This option is useful when our variable of interest is a factor with several levels and we want to do all the different comparisons. Finally, the argument fNames is used to select the variables from features annotation that will be added to the tables.

To exemplify the use of this function, we will evaluate our whole adjusted model, including age coefficient. We will also add some annotation of the CpGs:

head(getProbeResults(resAdj, rid = 1, coef = 2:3, 
                     fNames = c("chromosome", "start")))
##            statusnormal           age   AveExpr        F      P.Value
## cg09383816   -0.5938196 -0.0026657333 0.4885486 915.6123 2.002722e-06
## cg27651090    0.5433331 -0.0009235097 0.5411453 812.3259 2.594906e-06
## cg21938148   -0.6659934 -0.0028869823 0.4712352 774.8049 2.874494e-06
## cg25104555   -0.5254995 -0.0031493548 0.3906114 774.4414 2.877414e-06
## cg25937714   -0.5906359  0.0009181122 0.4350042 760.4902 2.992841e-06
## cg15732851   -0.5760397 -0.0050629750 0.3869865 747.4969 3.106531e-06
##             adj.P.Val chromosome     start
## cg09383816 0.04930423       chr8  67344556
## cg27651090 0.04930423      chr13 109270071
## cg21938148 0.0493042