1 Introduction

You will probably be familiar with multiple testing procedures that take a set of p-values and then calculate adjusted p-values. Given a significance level $$\alpha$$, one can then declare the rejected hypotheses. In R this is most commonly done with the p.adjust function in the stats package, and a popular choice is controlling the false discovery rate (FDR) with the method of (Benjamini and Hochberg 1995), provided by the choice method="BH" in p.adjust. A characteristic feature of this and other methods –responsible both for their versatility and limitations– is that they do not use anything else beyond the p-values: no other potential information that might set the tests apart, such as different signal quality, power, prior probability.

IHW (Independent Hypothesis Weighting) is also a multiple testing procedure, but in addition to the p-values it allows you to specify a covariate for each test. The covariate should be informative of the power or prior probability of each individual test, but is chosen such that the p-values for those hypotheses that are truly null do not depend on the covariate (Ignatiadis et al. 2016). Therefore the input of IHW is the following:

• a vector of p-values (of length $$m$$),
• a matching vector of covariates,
• the significance level $$\alpha \in (0,1)$$ at which the FDR should be controlled.

IHW then calculates weights for each p-value (non-negative numbers $$w_i \geq 0$$ such that they average to 1, $$\sum_{i=1}^m w_i = m$$). IHW also returns a vector of adjusted p-values by applying the procedure of Benjamini Hochberg (BH) to the weighted p-values $$P^\text{weighted}_i = \frac{P_i}{w_i}$$.

The weights allow different prioritization of the individual hypotheses, based on their covariate. This means that the ranking of hypotheses with p-value weighting is in general different than without. Two hypotheses with the same p-value can have different weighted p-values: the one with the higher weight will then have a smaller value of $$P^\text{weighted}_i$$, and consequently it can even happen that one but not the other gets rejected by the subsequent BH procedure.

As an example, let’s see how to use the IHW package in analysing for RNA-Seq differential gene expression. and then also look at some other examples where the method is applicable.

2 An example: RNA-Seq differential expression

We analyze the airway RNA-Seq dataset using DESeq2 (Love, Huber, and Anders 2014).

library("DESeq2")
library("dplyr")
data("airway", package = "airway")
dds <- DESeqDataSet(se = airway, design = ~ cell + dex) %>% DESeq
deRes <- as.data.frame(results(dds))

The output is a dataframe with the following columns, and one row for each tested hypothesis (i.e., for each gene):

colnames(deRes)
## [1] "baseMean"       "log2FoldChange" "lfcSE"          "stat"
## [5] "pvalue"         "padj"

In particular, we have p-values and baseMean (i.e., the mean of normalized counts) for each gene. As argued in the DESeq2 paper, these two statistics are approximately independent under the null hypothesis. Thus we have all the ingredient necessary for a IHW analysis (p-values and covariates), which we will apply at a significance level 0.1.

2.1 FDR control

library("IHW")
ihwRes <- ihw(pvalue ~ baseMean,  data = deRes, alpha = 0.1)

This returns an object of the class ihwResult. We can get, e.g., the total number of rejections.

rejections(ihwRes)
## [1] 4868

And we can also extract the adjusted p-values:

head(adj_pvalues(ihwRes))
## [1] 0.001280161          NA 0.158043053 0.838805669 1.000000000 1.000000000
sum(adj_pvalues(ihwRes) <= 0.1, na.rm = TRUE) == rejections(ihwRes)
## [1] TRUE

We can compare this to the result of applying the method of Benjamini and Hochberg to the p-values only:

padjBH <- p.adjust(deRes\$pvalue, method = "BH")
sum(padjBH <= 0.1, na.rm = TRUE)
## [1] 4099

IHW produced quite a bit more rejections than that. How did we get this power? Essentially it was possible by assigning appropriate weights to each hypothesis. We can retrieve the weights as follows:

head(weights(ihwRes))
## [1] 1.832350       NA 2.447218 2.489885 1.244898 0.000000

Internally, what happened was the following: We split the hypotheses into $$n$$ different strata (here $$n=22$$) based on increasing value of baseMean and we also randomly split them into $$k$$ folds (here $$k=5$$). Then, for each combination of fold and stratum, we learned the weights. The discretization into strata facilitates the estimation of the distribution function conditionally on the covariate and the optimization of the weights. The division into random folds helps us to avoid overfitting the data, something which could otherwise result in loss of control of the FDR (Ignatiadis et al. 2016).

The values of $$n$$ and $$k$$ can be accessed through

c(nbins(ihwRes), nfolds(ihwRes))
## [1] 22  5

In particular, each hypothesis test gets assigned a weight depending on the combination of its assigned fold and stratum.

We can also see this internal representation of the weights as a ($$n$$ X $$k$$) matrix:

weights(ihwRes, levels_only = TRUE)
##            [,1]      [,2]      [,3]      [,4]      [,5]
##  [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [2,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [5,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [7,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [8,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [9,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [11,] 0.2609681 0.2624698 0.2567675 0.2517039 0.2136704
## [12,] 0.2714622 0.7162379 0.7437053 0.8982321 0.9257029
## [13,] 0.6937328 0.8894949 1.2269089 0.9198106 0.9257029
## [14,] 1.2448985 1.2520619 1.2269089 1.2385683 1.0785389
## [15,] 3.0791454 2.4442057 2.3236256 2.1351651 2.4472179
## [16,] 2.7883473 2.4442057 2.3236256 2.4898846 2.4472179
## [17,] 2.5028878 2.3876075 2.3236256 2.4898846 2.4472179
## [18,] 2.8534126 2.3876075 2.3236256 2.4898846 2.4472179
## [19,] 1.8323503 2.3876075 2.2927624 2.4898846 2.2477617
## [20,] 2.3750539 2.4454325 2.3923045 2.4190779 2.3274858
## [21,] 1.5636329 2.3582168 2.0963355 2.2297692 2.3038841
## [22,] 2.2080598 2.2540614 2.0963355 2.2297692 2.3038841

2.2 Diagnostic plots

2.2.1 Estimated weights

plot(ihwRes)

We see that the general trend is driven by the covariate (stratum) and is the same across the different folds. As expected, the weight functions calculated on different random subsets of the data behave similarly. For the data at hand, genes with very low baseMean count get assigned a weight of 0, while genes with high baseMean count get prioritized.

2.2.2 Decision boundary

plot(ihwRes, what = "decisionboundary")