Good scientific practice: Design of High Throughput Experiments and their Analysis

Susan Holmes, Wolfgang Huber

2022-06-24

Design of High Throughput Experiments

RA Fisher Pioneer of Experimental Design
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.
(Fisher 1935) (Presidential Address to the First Indian Statistical Congress, 1938. Sankhya 4, 14-17).

Goals for this Lecture

The Art of “Good Enough”

Types of studies / experiments

Experiment

Retrospective observational studies

Prospective, controlled studies

Meta-analysis

Illustration: experiment

Well-characterized cell line growing in laboratory conditions on defined media, temperature and atmosphere.

We administer a precise amount of a drug, and after 72h we measure the activity of a specific pathway reporter.

Illustration: challenges with studies

We recruited 200 patients that have a disease, fulfill inclusion criteria (e.g. age, comorbidities, mental capacity) and ask them to take a drug each day exactly at 6 am. After 3 months, we take an MRI scan and lots of other biomarkers to see whether and how the disease has changed or whether there were any other side effects.

What to do about this?

Examples

What is a good normalization method?

library("DESeq2")
library("airway")
library("ggplot2")
library("dplyr")
library("gridExtra")
data("airway")
aw = DESeqDataSet(airway, design = ~ cell + dex) %>% estimateSizeFactors
sizeFactors(aw)

samples = c("SRR1039513", "SRR1039517") 

myScatterplot = function(x) {
  as_tibble(x) %>% 
  mutate(rs = rowSums(x)) %>%
  filter(rs >= 2) %>%
  ggplot(aes(x = asinh(SRR1039513), 
             y = asinh(SRR1039517))) + geom_hex(bins = 50) +
    coord_fixed() + 
    geom_abline(slope = 1, intercept = 0, col = "orange") + 
    theme(legend.position = "none")
}

grid.arrange(
  myScatterplot(counts(aw)),
  myScatterplot(counts(aw, normalized = TRUE)),
  ncol = 2)

What do we want from a good normalization method:

Possible figure of merit?

Occam’s razor

William of Ockham William of Ockham

If one can explain a phenomenon without assuming this or that hypothetical entity, there is no ground for assuming it.

One should always opt for an explanation in terms of the fewest possible causes, factors, or variables.

Error models: Noise is in the eye of the beholder

The efficiency of most biochemical or physical processes involving DNA-polymers depends on their sequence content, for instance, occurrences of long homopolymer stretches, palindromes, GC content.

These effects are not universal, but can also depend on factors like concentration, temperature, which enzyme is used, etc.

When looking at RNA-Seq data, should we treat GC content as noise or as bias?

One person’s noise can be another’s bias

We may think that the outcome of tossing a coin is completely random.

If we meticulously registered the initial conditions of the coin flip and solved the mechanical equations, we could predict which side has a higher probability of coming up: noise becomes bias.