Resampling Methods

Levi Waldron, CUNY School of Public Health

levi.waldron@sph.cuny.edu

waldronlab.github.io / waldronlab.org

June 15, 2017

Outline and introduction

ISLR Chapter 5: James, G. et al. An Introduction to Statistical Learning: with Applications in R. (Springer, 2013). This book can be downloaded for free at http://www-bcf.usc.edu/~gareth/ISL/getbook.html

Why do regression?

Inference

Bootstrap, permutation tests

Why do regression? (cont’d)

Prediction

Cross-validation

Cross-validation

Why cross-validation?

Figure 2.9 B

Figure 2.9 B

Under-fitting, over-fitting, and optimal fitting

K-fold cross-validation approach

  1. Randomly sample \(1/K\) observations (without replacement) as the validation set
  2. Use remaining samples as the training set
  3. Fit model on the training set, estimate accuracy on the validation set
  4. Repeat \(K\) times, not using the same validation samples
  5. Average validation accuracy from each of the validation sets
3-fold CV

3-fold CV

Variability in cross-validation

Variability of 2-fold cross-validation (ISLR Figure 5.2)

Variability of 2-fold cross-validation (ISLR Figure 5.2)

Cross-validation summary

Cross-validation caveats

http://hunch.net/?p=22

Cross-validation caveats (cont’d)

Waldron et al.: Optimized application of penalized regression methods to diverse genomic data. Bioinformatics 2011, 27:3399–3406.

Cross-validation caveats (cont’d)

Cross-validation vs. cross-study validation in breast cancer prognosis

Cross-validation vs. cross-study validation in breast cancer prognosis

Bernau C et al.: Cross-study validation for the assessment of prediction algorithms. Bioinformatics 2014, 30:i105–12.

Permutation test

Permutation test

Steps of permutation test:

  1. Calculate test statistic (e.g. T) in observed sample
  2. Permutation:
    1. Sample without replacement the response values (\(Y\)), using the same \(X\)
    2. re-compute and store the test statistic T
    3. Repeat R times, store as a vector \(T_R\)
  3. Calculate empirical p value: proportion of permutation \(T_R\) that exceed actual T

Calculating a p-value

\[ P = \frac{sum \left( abs(T_R) > abs(T) \right)+ 1}{length(T_R) + 1} \]

Calculating a False Discovery Rate

Permutation test - pros and cons

Example from (sleep) data:

##      extra        group        ID   
##  Min.   :-1.600   1:10   1      :2  
##  1st Qu.:-0.025   2:10   2      :2  
##  Median : 0.950          3      :2  
##  Mean   : 1.540          4      :2  
##  3rd Qu.: 3.400          5      :2  
##  Max.   : 5.500          6      :2  
##                          (Other):8

t-test for difference in mean sleep

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

Permutation test instead of t-test

set.seed(1)
permT = function(){
  index = sample(1:nrow(sleep), replace=FALSE)
  t.test(extra ~ group[index], data=sleep)$statistic
}
Tr = replicate(999, permT())
(sum(abs(Tr) > abs(Tactual)) + 1) / (length(Tr) + 1)
## [1] 0.079

Bootstrap

The Bootstrap

Schematic of the Bootstrap

Schematic of the Bootstrap

ISLR Figure 5.11: Schematic of the bootstrap

Uses of the Bootstrap

How to perform the Bootstrap

Example: bootstrap in the sleep dataset

t.test(extra ~ group, data=sleep)
## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

Example: bootstrap in the sleep dataset

set.seed(2)
bootDiff = function(){
  boot = sleep[sample(1:nrow(sleep), replace = TRUE), ]
  mean(boot$extra[boot$group==1]) - 
    mean(boot$extra[boot$group==2])
}
bootR = replicate(1000, bootDiff())
bootR[match(c(25, 975), rank(bootR))]
## [1] -3.32083333  0.02727273

note: better to use library(boot)

Example: oral carcinoma recurrence risk

Reis PP, Waldron L, et al.: A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence. BMC Cancer 2011, 11:437.

Example: oral carcinoma recurrence risk

Oral carcinoma with histologically normal margins

Oral carcinoma with histologically normal margins

Bootstrap estimation of HR for only one margin

Bootstrap re-sample with randomly selected margin

Bootstrap re-sample with randomly selected margin

Example: oral carcinoma recurrence risk

From results:

Simulating the selection of only a single margin from each patient, the 4-gene signature maintained a predictive effect in both the training and validation sets (median HR = 2.2 in the training set and 1.8 in the validation set, with 82% and 99% of bootstrapped hazard ratios greater than the no-effect value of HR = 1)

Monte Carlo

What is a Monte Carlo simulation?

How to conduct a Monte Carlo simulation

Random distributions form the basis of Monte Carlo simulation

Credit: Markus Gesmann http://www.magesblog.com/2011/12/fitting-distributions-with-r.html

Power Calculation for a follow-up sleep study

power.t.test(power=0.9, delta=(2.33-.75), 
        sd=1.9, sig.level=.05,
        type="two.sample", alternative="two.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 31.38141
##           delta = 1.58
##              sd = 1.9
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

The same calculation by Monte Carlo simulation

R script

set.seed(1)
montePval = function(n){
   group1 = rnorm(n, mean=.75, sd=1.9)
   group2 = rnorm(n, mean=2.33, sd=1.9)
   t.test(group1,group2)$p.value
}
sum(replicate(1000, montePval(n=32)) < 0.05) / 1000
## [1] 0.895

Summary: resampling methods

Procedure Application
Cross-Validation Data is randomly divided into subsets.
Results validated across sub-samples.
Model tuning
Estimation of prediction accuracy
Permutation Test Samples of size N drawn at random without replacement. Hypothesis testing

Summary: resampling methods

Procedure Application
Bootstrap Samples of size N drawn at random with replacement. Confidence intervals, hypothesis testing
Monte Carlo Data are sampled from a known distribution Power estimation, Bayesian posterior probabilities