<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{C: S4 Objects}
-->
`r knitr::opts_chunk$set(tidy=FALSE)`

# Extended case study: an S4 "Annotated Matrix" class

Why bother?
- Interoperability -- packages in the same work flow re-use the same
  data strutures, e.g., `GRanges` for describing regions of interest
  in sequencing experiments
- Programmer efficiency -- classes enable re-use, avoiding
  implementing functionality that already exists.
- Often, correct way to implement a class is not to -- re-use existing
  classes instead, even if it's not quite perfect

Motivation
- Numerical data typically augmented by important information about
  rows and columns
  - Rows ('regions of interested'): gene symbols, genome coordinates,
    significance values from other tests
  - Columns ('samples'): study identifiers, treatment groups,
    covariates
- Significant risks of mis-aligning row and column data, with
  catastrophic and real-world consequences

## Declaration

'Is a' versus 'has a'
- 'is a' caries a lot of baggage and introduces a lot of constraints
- Particularly challenging when thinking about extending base R
  objects, where the 'API' is not well-defined

Starting our class definition
- A `matrix()` with row- and column `data.frame()` annotations
- Construct with `setClass`; returns a simple generating function.
  ```{r setClass}
  .AnnMat <- setClass("AnnMat",
      representation(matData="matrix", rowData="data.frame",
          colData="data.frame"))
  ```
  - 'Is a' (inheritance) relationship via `contains=` argument;
    multiple in inheritance possible.
  - In use
  ```{r generating-function}
  am0 <- .AnnMat()
  am1 <- .AnnMat(
        matData=matrix(1:10, 2, dimnames=list(letters[1:2], LETTERS[1:5])),
        rowData=data.frame(roi_id=1:2),
        colData=data.frame(
            sample_id=1:5,
            treatment=c("A", "A", "B", "B", "B")))
  ```
  
A simple method: accessors ("getters")
- Definition of generics
  ```{r getters}
  setGeneric("rowData", function(x, ...) standardGeneric("rowData"))
  
  setGeneric("colData", function(x, ...) standardGeneric("colData"))
  
  setGeneric("matData", function(x, ...) standardGeneric("matData"))
  ```
  
- Then methods implemented of class-specific methods on the generics
  ```{r getter-methods}
  setMethod("rowData", "AnnMat", function(x, ...) x@rowData)
  
  setMethod("colData", "AnnMat", function(x, ...) x@colData)
  
  setMethod("matData", "AnnMat", function(x, ...) x@matData)
  ```
  
A simple method: `dim` and `dimnames`

- Discover existing generic, for signature
  ```{r dim-getGeneic}
  getGeneric("dim")
  ```
  - Implement a method on our class
  ```{r dim-setMethod}
  setMethod("dim", "AnnMat", function(x) dim(matData(x)))

  setMethod("dimnames", "AnnMat", function(x) dimnames(matData(x)))
  ```
  - **Question** Hey neat, `nrow()` and `ncol()` (and `rownames()` and
    `colnames()`) for free! Why is that?
  ```{r nrow-for-free}
  dim(am1)
  nrow(am1)
  ```

A simple method: `show`
- Purpose: brief summary when displaying (printing) during interactive
  use
- Existing generic: `getGeneric("show")`
- Many existing methods: `showMethods("show", where=search())`
- Our implementation: brief summary, with an eye toward re-use by
  derived classes. Avoid direct slot access
  ```{r show}
  setMethod("show", "AnnMat", function(object)
  {
        cat("class:", class(object), "\n")
        cat("dim:", dim(object), "\n")
        cat("rowData names():", names(rowData(object)), "\n")
        cat("colData names():", names(colData(object)), "\n")
  })
  am1
  ```
- **Question** Is there a better overall philosophy for `show`?

A more complicated method: updating ('replacement', 'setter') methods

- Provide the illusion and simple syntax for in-place modification

- A familiar example: update the value of a column in a `data.frame`

  ```{r update}
  df <- data.frame(x=1:5, y=5:1)
  df[,"x"] <- log(df$x)
  ```

  - R denotes `df[,"x"]` as
     `[.data.frame`, "the subset method for data.frame", and `df[,"x"]
     <- value` as
     `[<-.data.frame` "the subset-replace method for data.frame".

  - There actually _is_ a function `[<-.data.frame`

    ```{r subset-assign-df}
    head(get("[<-.data.frame"))
    ```
    
    R's parser translates `df[,"x"] <- value` to
    `[<-.data.frame(x, , "x", value) and actually modifies (a copied, 
    if necessary) first argument.

- Replacement methods, e.g., `matData<-`, signature takes the object
  to be updated, additional optional arguments, and the value to
  update the argument with

  ```{r setter-matData}
  setGeneric("matData<-", function(x, ..., value)
        standardGeneric("matData<-"))
  ```
  
  Dispatch on one or both of `x`, `value`

- Impelement as a method that dispatches on both the object and
  value, updates the slot, and returns the updated object.

  ```{r setter-matData-impl}
  setReplaceMethod("matData", c("AnnMat", "matrix"),
        function(x, ..., value)
  {
        x@matData <- value
        x
  })
  ```

  **Exercise**: walk through how that assignment in the body works.
  
- Another replacement method, for `dimnames<-` (the generic already
  exists; what is it?)

  ```{r setter-dimnames-impl}
  setReplaceMethod("dimnames", c("AnnMat", "list"), 
        function(x, value)
  {
        dimnames(matData(x)) <- value
        value
  })
  ````

- **Exercise**: walk through how that assignment in the body of the method works

- Hey neat, we get `rownames<-` and `colnames<-` for free!

A more complicated operation: validity

- Constraints on row, column and matrix dimensions: all must be equal

- `validity` argument to `setClass`, or `setValidity()` function call. 

- Validity function is weird

  - Each class in hierarchy visited, so no need to test for
    super-class properties
  - returns TRUE if the object is valid, a text string defining the
    transgression otherwise.

- Evaluated frequently, so needs to be efficient / light-weight

   ```{r setValidity}
   setValidity("AnnMat", function(object) {
         msg <- NULL
         if (nrow(rowData(object)) != nrow(matData(object)))
             msg <- c(msg, 
                 "number of rowData rows and matData rows differ")
         if (nrow(colData(object)) != ncol(matData(object)))
             msg <- c(msg,
                 "number of colData rows and matData columns differ")
         if (is.null(msg)) TRUE else msg
   })
   ```
   
   In action:
   
   ```{r validity-action}
   .AnnMat(matData=matrix(1:10, 2), 
         rowData=data.frame(roi_id=1:2),
         colData=data.frame(sample_id=1:5))
   cat(try({
         .AnnMat(matData=matrix(1:10, 2), 
             rowData=data.frame(roi_id=1:5),
             colData=data.frame(sample_id=1:2))
   }))
   ```

A more complicated method: subsetting

- Why do we need this? Part of the informal matrix 'API' expected by a
  user
  
- Discovery
  ```{r subsetting}
  getGeneric("[")
  ```

- Possible methods multiply -- `x` times `i` times `j`; e.g., `i` could be 
  integer, logical, character, ...
- One approach -- _facade_ of methods that do minimal work to translate 
  into a common base function
- Special variable classes: `ANY`, `missing`
- Exploit default `initialize` function, which acts as a copy constructor 
  that updates slots in its first argument with values provided by named
  arguments.
- Pass '...' to allow derived classes to use this method

  ```{r subset-facade}
  setMethod("[", c("AnnMat", "ANY", "ANY"),
        function(x, i, j,  ..., drop=TRUE)
  {
        ## FIXME: warn user about ignoring 'drop'?
       initialize(x, matData=matData(x)[i, j, drop=FALSE],
            rowData=rowData(x)[i,,drop=FALSE],
            colData=colData(x)[,j,drop=FALSE], ...)
  })
  
  setMethod("[", c("AnnMat", "ANY", "missing"),
        function(x, i, j, ..., drop=TRUE)
  {
        initialize(x, matData=matData(x)[i,,drop=FALSE],
            rowData=rowData(x)[i,,drop=FALSE])
  })

  setMethod("[", c("AnnMat", "missing", "ANY"),
        function(x, i, j, ..., drop=TRUE)
  {
        initialize(x, matData=matData(x)[,j,drop=FALSE],
            colData=colData(x)[j,,drop=FALSE])
  })

  setMethod("[", c("AnnMat", "missing", "missing"),
        function(x, i, j, ..., drop=TRUE)
  {
        initialize(x, ...)
  })

```
- **Exercise**: create some simple unit tests for these methods.
- Seems 'good enough' for numeric or logical indexes, what about character?

What else have we agreed to in the matrix API?