Remember

- More important that the calculation is correct than fast
- Programmer time is more valuable than computer time
- Others have likely been there before you

But also

- The VisiCalc revolution:
*interactive*computation can be transformative.

Anecdote: Fitting GLM's to SNPs.

- Original throughput using
`glm()`

: 10's of SNPs per second, days for full analysis - Avoid re-specifying model and other standard speed-ups: about 1000 SNPs per second, hours for total calculation.
- Improved data management via ncdf (nowadays: rhdf5) allowing input of data slices. No immediate speed-up, but allows…
- Use on a large cluster (100 nodes x 12 cores / node x 1000 SNPs / sec): complete analysis in seconds.
- Transformed from asynchronous, plodding exploration to
**interactive analysis** - Epilogue: snpStats and especially MatrixEQTL perform almost as well without need for complicated cluster

`identical`

and `all.equal`

- physical vs. numerical identity

```
identical(1, 1L)
```

```
## [1] FALSE
```

```
all.equal(1, 1L)
```

```
## [1] TRUE
```

- Attributes

```
v0 <- c(1, 2); v1 <- c(a=1, b=2)
identical(v0, v1)
```

```
## [1] FALSE
```

```
all.equal(v0, v1)
```

```
## [1] "names for current but not for target"
```

```
all.equal(v0, v1, check.attributes=FALSE)
```

```
## [1] TRUE
```

Gotcha

- Environments (hence, reference classes)

```
x <- new.env(); y <- new.env()
all.equal(x, y)
```

```
## [1] TRUE
```

```
x[["a"]] <- 1
all.equal(x, y)
```

```
## [1] TRUE
```

- External pointers

`system.time`

and the microbenchmark package`system.time()`

- Informal / crude benchmarks:
`system.time()`

- First argument can be an expression

```
system.time({
Sys.sleep(1)
runif(1000)
})
```

```
## user system elapsed
## 0.000 0.001 1.001
```

- Use 'gets' (
`<-`

) to simultaneously time and assign the result

```
system.time(x <- runif(10000000))
```

```
## user system elapsed
## 0.316 0.001 0.316
```

```
length(x)
```

```
## [1] 10000000
```

```
mean(x)
```

```
## [1] 0.5
```

- Systematically compare timing of different functions
- Automatic replication and randomization of evaluation order

```
library(microbenchmark)
f0 <- function(n) runif(n)
f1 <- function(n) replicate(n, runif(1))
times <- microbenchmark(f0(10), f0(100), f0(1000), f0(2000), f1(10), times=100)
times
```

```
## Unit: microseconds
## expr min lq median uq max neval
## f0(10) 2.861 3.115 3.381 4.397 6.421 100
## f0(100) 5.496 5.880 6.409 7.643 16.330 100
## f0(1000) 31.908 33.576 35.063 37.921 48.106 100
## f0(2000) 61.980 64.650 66.553 70.550 97.939 100
## f1(10) 53.191 55.626 58.752 68.242 179.455 100
```

- Pretty pictures

```
plot(times)
```

`Rprof`

FIXME

- See Intermediate Sequence Analysis 2013 section 2.3.2.

**Do no more than needed**From StackOverflow: transform SNP genotypes from`A_T`

to`AT`

. Data:`a <- c("A", "C", "G", "T") n <- 10000000 geno <- paste(sample(a, n, TRUE), sample(a, n, TRUE), sep="_")`

Possible help pages:

`?grep`

,`?substring`

. Compare possibilities for error, performance.**Hoist common operations from loops, or better…**:Improve the performance of the imperative clamp algorithm, while retaining an imperative programming style

`n <- 3 for (i in seq_along(x)) if (abs(x[i]) > n * sd(x)) x[i] <- n * sd(x)`

and the following data

`set.seed(123) x <- runif(10000)`

**…Avoid loops entirely by writing in a declaritve manner**:Compare the performance of the imperative clamp with a declarative alternative

`x[abs(x) > n * sd(x)] <- n * sd(x)`

How can the performance of the declarative clamp be improved?

Unit tests

- Write simple functions ('unit tests') that describe expected behavior
- Write unit tests that replicate bugs before being fixed
- Write unit tests that catch common edge cases, e.g., zero-length arguments
*Very*useful as your code becomes more complicated; helps think about how edge cases should behave.

- A framework for writing unit tests
- Helpers to test outcome of tests, e.g.,
`checkEquals()`

,`checkIdentical()`

,`checkException()`

. - Naming convention for test functions and files containing test functions
- Example: expected behavior of
`log`

```
library(RUnit)
test_log0 <- function() {
checkIdentical(-Inf, log(0))
checkIdentical(0, log(1))
checkIdentical(0, exp(log(0)))
checkIdentical(1, exp(log(1)))
old_opts <- options(warn=2)
on.exit(options(old_opts))
checkException(log(-1), silent=TRUE)
checkException(log(), silent=TRUE)
}
test_log0()
```

- Bioconductor best practices for implementing unit tests to be evaluated when packages are checked.
- Also: testthat package as alternative test framework