Abstract

Identifying rare biological events in high-throughput screens requires using the best available normalization and statistical inference procedures. It is not always clear, however, which algorithms are best suited for a particular screen. The Statistics and dIagnostics Graphs for High Throughput Screening (The `sights`

package provides numerous normalization methods that correct the three types of bias that affect High-Throughput Screening (HTS) measurements: overall plate bias, within-plate spatial bias, and across-plate bias. Commonly-used normalization methods such as Z-scores (or methods such as percent inhibition/activation which use within-plate controls to normalize) correct only overall plate bias. Methods included in this package attempt to correct all three sources of bias and typically give better results.

Two statistical tests are also provided: the standard one-sample t-test and the recommended one-sample Random Variance Model (RVM) t-test, which has greater statistical power for the typically small number of replicates in HTS. Correction for the multiple statistical testing of the large number of constructs in HTS data is provided by False Discovery Rate (FDR) correction. The FDR can be described as the proportion of false positives among the statistical tests called significant.

Included graphical and statistical methods provide the means for evaluating data analysis choices for HTS assays on a screen-by-screen basis. These graphs can be used to check fundamental assumptions of both raw and normalized data at every step of the analysis process.

Citing Methods

Please cite the `sights`

package and specific methods as appropriate.

References for the methods can be found in this vignette, on their specific help pages, and in the manual. They can also be accessed by `help(sights_method_name)`

in R. For example:

The package citation can be accessed in R by:

```
citation("sights")
>>
>> Garg E, Murie C, Nadon R (2016). _sights: Statistics and
>> dIagnostic Graphs for HTS_. R package version 1.11.0.
>>
>> A BibTeX entry for LaTeX users is
>>
>> @Manual{,
>> title = {sights: Statistics and dIagnostic Graphs for HTS},
>> author = {Elika Garg and Carl Murie and Robert Nadon},
>> year = {2016},
>> note = {R package version 1.11.0},
>> }
```

- Please install the package directly from Bioconductor and load it. Note that SIGHTS requires a minimum R version of 3.3.

```
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("sights")
library("sights")
```

- This should also install and load the packages that SIGHTS imports: ggplot2 (Wickham, 2009), reshape2 (Wickham, 2007), qvalue (Storey, 2015), MASS (Venables and Ripley, 2002), and lattice (Sarkar, 2008).

Otherwise, you can install/update these packages manually.

All SIGHTS normalization functions require that the data be arranged such that each plate is a column and each row is a well. The arrangement within each plate should be by-row first, then by-column. For more details and example, see `help("ex_dataMatrix")`

. This required arrangement can be done in Microsoft Excel before importing the data into R, although advanced users may prefer to do so in R as needed.

- The datasets within SIGHTS can be loaded by:

```
data("ex_dataMatrix")
help("ex_dataMatrix")
## Required data arrangement (by-row first) is explained.
data("inglese")
```

- Your own data can be imported by giving the path of your file:

- If it is a .csv or .txt file, run

```
read.csv("~/yourfile.csv", header = TRUE, sep = ",")
## '~' is the folder location of your file 'yourfile.csv'.
## Use header=TRUE if you have column headers (recommended); otherwise, use
## header=FALSE.
## N.B. Be sure to use a forward slash ('/') to separate folder names.
```

- If it is a Microsoft Excel file, you can import it directly by installing another package:

```
install.packages("xlsx")
## This installs the xlsx package which enables import/export of Excel files.
library("xlsx")
read.xlsx("~/yourfile.xlsx", sheetIndex = 1) # or
read.xlsx("~/yourfile.xlsx", sheetName = "one")
## sheetIndex is the sheet number where your data is stored in
## 'yourfile.xlsx'; sheetName is the name of that sheet.
```

- Similarly any object saved in R (e.g. normalized results) can be exported as .csv or .xlsx files:

- There are two datasets provided within SIGHTS:

- CMBA data (Murie
*et al.*, 2015), see`help("ex_dataMatrix")`

- Inglese
*et. al.*data (Inglese*et al.*, 2006), see`help("inglese")`

- Some basic information about data (including your own data after importing) can be accessed by various functions. For example, information about the Inglese
*et al.*data set can be obtained as follows:

```
View(inglese)
## View the entire dataset
edit(inglese)
## Edit the dataset
head(inglese)
## View the top few rows of the dataset
str(inglese)
## Get information on the structure of the dataset
summary(inglese)
## Get a summary of variables in the dataset
names(inglese)
## Get the variable names of the dataset
```

- There are several methods provided within SIGHTS:

- Normalization:
- Z, Robust Z (see Malo
*et al.*(2006)), - Loess (Baryshnikova
*et al.*, 2010), - Median Filter (Bushway
*et al.*, 2011), - R (Wu
*et al.*, 2008), and - SPAWN (Murie
*et al.*, 2015).

- Z, Robust Z (see Malo
- Statistical testing:
- one-sample t-test,
- one-sample RVM t-test (Malo
*et al.*, 2006; Wright and Simon, 2003), and - FDR correction (Storey, 2002).

- Plotting:
- 3d plot,
- heatmap,
- auto-correlation plot,
- scatter plot,
- boxplot,
- inverse-gamma fit plot, and
- histograms.

See `help("normSights")`

, `help("statSights")`

, `help("plotSights")`

, and the help pages of individual methods for more information.

- Information about the package functions can be accessed by:

```
ls("package:sights")
## Lists all the functions and datasets available in the package
lsf.str("package:sights")
## Lists all the functions and their usage
args(plotSights)
## View the usage of a specific function
example(topic = plotSights, package = "sights")
## View examples of a specific function
```

Normalization - All normalization functions are accessible either via

`normSights()`

or their individual function names (e.g.`normSPAWN()`

).Statistical tests - All statistical testing functions are accessible either via

`statSights()`

or their individual function names (e.g.`statRVM()`

).Plots - All plotting functions are accessible either via

`plotSights()`

or their individual function names (e.g.`plotAutoco()`

).

The results of these functions can be saved as objects and called by their assigned names. For example:

```
library(sights)
data("inglese")
# Normalize
spawn_results <- normSPAWN(dataMatrix = inglese, plateRows = 32, plateCols = 40,
dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE,
biasMatrix = NULL, biasCols = 1:18)
## Or
spawn_results <- normSights(normMethod = "SPAWN", dataMatrix = inglese, plateRows = 32,
plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE,
biasMatrix = NULL, biasCols = 1:18)
## Access
summary(spawn_results)
# Apply statistical test
rvm_results <- statRVM(normMatrix = spawn_results, repIndex = rep(1:3, each = 3),
normRows = NULL, normCols = 1:9, testSide = "two.sided")
## Or
rvm_results <- statSights(statMethod = "RVM", normMatrix = spawn_results, repIndex = c(1,
1, 1, 2, 2, 2, 3, 3, 3), normRows = NULL, normCols = 1:9, ctrlMethod = NULL,
testSide = "two.sided")
## Access
head(rvm_results)
# Plot
autoco_results <- plotAutoco(plotMatrix = spawn_results, plateRows = 32, plateCols = 40,
plotRows = NULL, plotCols = 1:9, plotName = "SPAWN_Inglese", plotSep = TRUE)
## Or
autoco_results <- plotSights(plotMethod = "Autoco", plotMatrix = spawn_results,
plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = c(1, 2, 3, 4,
5, 6, 7, 8, 9), plotName = "SPAWN_Inglese", plotSep = TRUE)
## Access
autoco_results
autoco_results[[1]]
```