Common pitfalls

Identity and performance

Remember

But also

Anecdote: Fitting GLM's to SNPs.

identical and all.equal

identical(1, 1L)
## [1] FALSE
all.equal(1, 1L)
## [1] TRUE
v0 <- c(1, 2); v1 <- c(a=1, b=2)
identical(v0, v1)
## [1] FALSE
all.equal(v0, v1)
## [1] "names for current but not for target"
all.equal(v0, v1, check.attributes=FALSE)
## [1] TRUE

Gotcha

x <- new.env(); y <- new.env()
all.equal(x, y)
## [1] TRUE
x[["a"]] <- 1
all.equal(x, y)
## [1] TRUE

system.time and the microbenchmark package

system.time()

system.time({
    Sys.sleep(1)
    runif(1000)
})
##    user  system elapsed 
##   0.000   0.001   1.001
system.time(x <- runif(10000000))
##    user  system elapsed 
##   0.316   0.001   0.316
length(x)
## [1] 10000000
mean(x)
## [1] 0.5

microbenchmark

library(microbenchmark)
f0 <- function(n) runif(n)
f1 <- function(n) replicate(n, runif(1))
times <- microbenchmark(f0(10), f0(100), f0(1000), f0(2000), f1(10), times=100)
times
## Unit: microseconds
##      expr    min     lq median     uq     max neval
##    f0(10)  2.861  3.115  3.381  4.397   6.421   100
##   f0(100)  5.496  5.880  6.409  7.643  16.330   100
##  f0(1000) 31.908 33.576 35.063 37.921  48.106   100
##  f0(2000) 61.980 64.650 66.553 70.550  97.939   100
##    f1(10) 53.191 55.626 58.752 68.242 179.455   100
plot(times)

plot of chunk microbenchmark-times-plot

Rprof

FIXME

Common solutions

Exercises

  1. Do no more than needed From StackOverflow: transform SNP genotypes from A_T to AT. Data:

    a <- c("A", "C", "G", "T")
    n <- 10000000
    geno <- paste(sample(a, n, TRUE), sample(a, n, TRUE), sep="_")
    

    Possible help pages: ?grep, ?substring. Compare possibilities for error, performance.

  2. Hoist common operations from loops, or better…:

    Improve the performance of the imperative clamp algorithm, while retaining an imperative programming style

    n <- 3
    for (i in seq_along(x))
       if (abs(x[i]) > n * sd(x))
           x[i] <- n * sd(x)
    

    and the following data

    set.seed(123)
    x <- runif(10000)
    
  3. …Avoid loops entirely by writing in a declaritve manner:

    Compare the performance of the imperative clamp with a declarative alternative

    x[abs(x) > n * sd(x)] <- n * sd(x)
    

    How can the performance of the declarative clamp be improved?

Unit tests

Unit tests

RUnit

library(RUnit)
test_log0 <- function() {
   checkIdentical(-Inf, log(0))
   checkIdentical(0, log(1))
   checkIdentical(0, exp(log(0)))
   checkIdentical(1, exp(log(1)))
   old_opts <- options(warn=2)
   on.exit(options(old_opts))
   checkException(log(-1), silent=TRUE)
   checkException(log(), silent=TRUE)
}
test_log0()