EpiDISH 2.21.1

- 1 Introduction
- 2 How to estimate cell-type fractions in blood
- 3 How to estimate generic cell-type fractions in a solid tissue
- 4 How to estimate immune cell-type fractions in a solid tissue using HEpiDISH
- 5 More info about different methods for cell-type fractions estimation
- 6 How to identify differentially methylated cell-types in EWAS
- 7 Sessioninfo
- References

The **EpiDISH** package provides tools to infer the fractions of a priori known cell subtypes present in a DNA methylation (DNAm) sample representing a mixture of such cell-types. Inference proceeds via one of 3 methods (Robust Partial Correlations-RPC(A. E. Teschendorff et al. 2017), Cibersort-CBS(Newman et al. 2015), Constrained Projection-CP(Houseman et al. 2012)), as determined by the user. Besides, we also provide a function - CellDMC which allows the identification of differentially methylated cell-types in Epigenome-Wide Association Studies(EWAS)(Zheng, Breeze, et al. 2018). For now, *the package contains 6 DNAm reference matrices*, three of which are designed for *whole blood* (A. E. Teschendorff et al. 2017) and (Luo et al. 2023):

`centDHSbloodDMC.m`

: This DNAm reference matrix for blood will estimate fractions for 7 immune cell types (B-cells, NK-cells, CD4T and CD8T-cells, Monocytes, Neutrophils and Eosinophils).`cent12CT.m`

: This DNAm reference matrix for blood and EPIC-arrays will estimate fractions for 12 immune-cell types (naive and mature B-cells, naive and mature CD4T-cells, naive and mature B-cells, T-regulatory cells, NK-cells, Neutrophils, Monocytes, Eosinophils, Basophils).`cent12CT450k.m`

: This DNAm reference matrix for blood and Illumina 450k-arrays will estimate fractions for 12 immune-cell types (naive and mature B-cells, naive and mature CD4T-cells, naive and mature B-cells, T-regulatory cells, NK-cells, Neutrophils, Monocytes, Eosinophils, Basophils).

The other 3 DNAm reference matrices are designed for solid tissue-types (Zheng, Webster, et al. 2018):

`centEpiFibIC.m`

: This DNAm reference matrix is designed for a generic solid tissue that is dominated by an epithelial, stromal and immune-cell component. It will estimate fractions for 3 broad cell-types: a generic epithelial, fibroblast and immune-cell type.`centBloodSub.m`

: This DNAm reference matrix is designed for a solid tissue-type and will estimate immune cell infiltration for 7 immune cell subtypes. This DNAm reference matrix is meant to be applied after`centEpiFibIC.m`

to yield proportions for 7 immune cell subtypes alongside the total epithelial and total fibroblast fractions.`centEpiFibFatIC.m`

: This DNAm reference matrix is a more specialised version for breast tissue and will estimate total epithelial, fibroblast, immune-cell and fat fractions.

We show an example of using our package to estimate 7 immune cell-type fractions in whole blood. We use a subset beta value matrix of GSE42861 (detailed description in manual page of *LiuDataSub.m*). First, we read in the required objects:

```
library(EpiDISH)
data(centDHSbloodDMC.m)
data(LiuDataSub.m)
```

`BloodFrac.m <- epidish(beta.m = LiuDataSub.m, ref.m = centDHSbloodDMC.m, method = "RPC")$estF`

We can easily check the inferred fractions with boxplots. From the boxplots, we observe that just as we expected, the major cell-type in whole blood is neutrophil.

`boxplot(BloodFrac.m)`

If we wanted to infer fractions at a higher resolution of 12 immune cell subtypes, we would replace `centDHSbloodDMC.m`

in the above with `cent12CT450k.m`

because this is a 450k DNAm dataset. For an EPIC whole blood dataset, you would use `cent12CT.m`

.

To illustrate how this works, we first read in a dummy beta value matrix *DummyBeta.m*, which contains 2000 CpGs and 10 samples, representing a solid tissue:

```
data(centEpiFibIC.m)
data(DummyBeta.m)
```

Notice that *centEpiFibIC.m* has 3 columns, with names of the columns as EPi, Fib and IC. We go ahead and use *epidish* function with *RPC* mode to infer the cell-type fractions.

`out.l <- epidish(beta.m = DummyBeta.m, ref.m = centEpiFibIC.m, method = "RPC") `

Then, we check the output list. *estF* is the matrix of estimated cell-type fractions. *ref* is the reference centroid matrix used, and *dataREF* is the subset of the input data matrix over the probes defined in the reference matrix.

`out.l$estF`

```
## Epi Fib IC
## S1 0.08836819 0.06109607 0.8505357378
## S2 0.07652115 0.57326994 0.3502089007
## S3 0.15417391 0.75663136 0.0891947251
## S4 0.77082647 0.04171941 0.1874541181
## S5 0.03960599 0.31921224 0.6411817742
## S6 0.12751711 0.79642919 0.0760537000
## S7 0.18144315 0.72889883 0.0896580171
## S8 0.20220823 0.40929344 0.3884983293
## S9 0.19398079 0.80540932 0.0006098973
## S10 0.27976647 0.23671333 0.4835201992
```

`dim(out.l$ref)`

`## [1] 599 3`

`dim(out.l$dataREF)`

`## [1] 599 10`

Note: As part of the quality control step in DNAm data preprocessing, we might have to remove bad probes; consequently, not all probes in the reference matrix may be available in a given dataset. By checking *ref* and *dataREF*, we can extract the probes actually used for estimating cell-type fractions. As shown by us (Zheng, Webster, et al. 2018), if the proportion of missing reference matrix probes is more than a third, then estimated fractions may be unreliable.

HEpiDISH is an iterative hierarchical procedure of EpiDISH designed for solid tissues with significant immune-cell infiltration. HEpiDISH uses two distinct DNAm references, a primary reference for the estimation of total epithelial, fibroblast and immune-cell fractions, and a separate secondary non-overlapping DNAm reference for the estimation of underlying immune cell subtype fractions.