CRISPR package demo Bioc2014 Boston

July 31st 2014

First load the required packages and specify the input file path. We are going to use a sequence from human as input, which has been included as as fasta file in the CRISPRseek package. To perform off target analysis, we need to load Human BSgenome package To annotate the target and off-targets, we need to load Human Transcript package Additionaly, need to specify the file containing all restriction enzyme (RE) cut patterns. You have the option to use the RE pattern file in the CRISPR package, or specify your own RE pattern file. Furthermore, you need to specify the output directory which will be the directory to look for all the output files.

library(CRISPRseek)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## 
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## 
## The following object is masked from 'package:stats':
## 
##     xtabs
## 
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, as.vector, cbind, colnames, do.call,
##     duplicated, eval, evalq, get, intersect, is.unsorted, lapply,
##     mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rep.int, rownames, sapply, setdiff,
##     sort, table, tapply, union, unique, unlist
## 
## Loading required package: Biostrings
## Loading required package: IRanges
## Loading required package: XVector
## Loading required package: BSgenome
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## 
## Attaching package: 'AnnotationDbi'
## 
## The following object is masked from 'package:BSgenome':
## 
##     species
outputDir <- file.path(getwd(),"CRISPRseekDemo")

inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek')
REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek')

Here is the command to learn more about offTargetAnalysis function and different

use cases.

?offTargetAnalysis
?compare2Sequences
?CRISPRseek
browseVignettes('CRISPRseek')

Scenario 1: Finding paired gRNAs with off-target analysis

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,  
    REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,  
    BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, 
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, 
    outputDir = outputDir,  overwrite = TRUE) 
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Maximum mismatch can be altered. The larger it is, the slower it runs.

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,  
    REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,  
    BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, 
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 2, 
    outputDir = outputDir,  overwrite = TRUE) 
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Scenario 2: Finding paired gRNAs with restriction enzyme cut site(s) and

off-target analysis

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,  
    REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,  
    BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, 
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, 
    outputDir = outputDir,  overwrite = TRUE) 
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Scenario 3: Finding all gRNAs with off-target analysis, which will be the slowest

Please note that max.mismatch is set to 3 so that we can view the off-targets

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,  
    REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE,  
    BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, 
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 3, 
    outputDir = outputDir,  overwrite = TRUE) 
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Scenario 4: Finding gRNAs with restriction enzyme cut site(s) and off-target

analysis

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,  
    REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE,  
    BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, 
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, 
    outputDir = outputDir,  overwrite = TRUE) 
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Scenario 5: Target and off-target analysis for user specified gRNAs

Calling the function offTargetAnalysis with findgRNAs = FALSE results in target and off-target searching, scoring and annotating for the input gRNAs. The gRNAs will be annotated with restriction enzyme cut sites for users to review later. However, paired information will not be available.

gRNAFilePath <- system.file('extdata', 'testHsap_GATA1_ex2_gRNA1.fa', 
    package = 'CRISPRseek')
offTargetAnalysis(inputFilePath = gRNAFilePath,
    findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile,
    findPairedgRNAOnly = FALSE, findgRNAs = FALSE,
    BSgenomeName = Hsapiens, chromToSearch = 'chrX',
    txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
    max.mismatch = 2, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/

Scenario 6. Quick gRNA finding without off-target analysis

Calling the function offTargetAnalysis with chromToSearch = ”” results in quick gRNA search without performing on-target and off-target analysis. Parameters findgRNAsWithREcutOnly and find- PairedgRNAOnly can be tuned to indicate whether searching for gRNAs overlap restriction enzyme cut sites or not, and whether searching for gRNAs in paired configuration or not.

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,
    REpatternFile = REpatternFile,findPairedgRNAOnly = TRUE,
    chromToSearch = "", outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/
##   A DNAStringSet instance of length 2
##     width seq                                          names               
## [1]    23 TGTCCTCCACACCAGAATCAGGG                      gRNAf1_Hsap_GATA1...
## [2]    23 CCAGAGCAGGATCCACAAACTGG                      gRNAr1_Hsap_GATA1...

Scenario 7. Find potential gRNAs preferentially targeting one of two alleles

without running time-consuming off-target analysis on all possible gRNAs.

Below is an example to search for all gRNAs that target at least one of the alleles. Two files are provided containing sequences that differ by a single nucleotide polymorphism (SNP). The results are saved in file scoresFor2InputSequences.xls in outputDir directory.

inputFile1Path <- system.file("extdata", "rs362331C.fa", package = "CRISPRseek")
inputFile2Path <- system.file("extdata", "rs362331T.fa", package = "CRISPRseek")
seqs <- compare2Sequences(inputFile1Path, inputFile2Path,
    outputDir = outputDir , REpatternFile = REpatternFile,
    overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/rs362331C.fa/ 
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory  /home/ubuntu/CRISPRseekDemo/rs362331T.fa/ 
## [1] "Scoring ..."
## [1] "Done!"

Excercise 1

To preferentially target one allele, select gRNA sequences that have the lowest score for the other allele. Selected gRNAs can then be examined for off-target sequences as described in Scenario 6.

Excercise 2

Identify gRNAs that target the following two input sequences equally well with minimized off-target cleavage

MfSerpAEx2 GACGATGGCATCCTCCGTTCCCTGGGGCCTCCTGCTGCTGGCGGGGCTGTGCTGCCTGGCCCCCCGCTCCCTGGCCTCGAGTCCCCTGGGAGCCGCTGTCCAGGACACAGGTGCACCCCACCACGACCATGAGCACCATGAGGAGCCAGCCTGCCACAAGATTGCCCCGAACCTGGCCGACTTCGCCTTCAGCATGTACCGCCAGGTGGCGCATGGGTCCAACACCACCAACATCTTCTTCTCCCCCGTGAGCATCGCGACCGCCTTTGCGTTGCTTTCTCTGGGGGCCAAGGGTGACACTCACTCCGAGATCATGAAGGGCCTTAGGTTCAACCTCACTGAGAGAGCCGAGGGTGAGGTCCACCAAGGCTTCCAGCAACTTCTCCGCACCCTCAACCACCCAGACAACCAGCTGCAGCTGACCACTGGCAATGGTCTCTTCATCGCTGAGGGCATGAAGCTACTGGATAAGTTTTTGGAGGATGTCAAGAACCTGTACCACTCAGAAGCCTTCTCCACCAATTTCGGGGACACCGAAGCAGCCAAGAAACAGATCAACGATTATGTTGAGAAGGGAACCCAAGGGAAAATTGTGGATTTGGTCAAAGACCTTGACAAAGACACAGCTTTCGCTCTGGTGAATTACATTTTCTTTAAAG

HsSerpAEx2 GACAATGCCGTCTTCTGTCTCGTGGGGCATCCTCCTGCTGGCAGGCCTGTGCTGCCTGGTCCCTGTCTCCCTGGCTGAGGATCCCCAGGGAGATGCTGCCCAGAAGACAGATACATCCCACCATGATCAGGATCACCCAACCTTCAACAAGATCACCCCCAACCTGGCTGAGTTCGCCTTCAGCCTATACCGCCAGCTGGCACACCAGTCCAACAGCACCAATATCTTCTTCTCCCCAGTGAGCATCGCTACAGCCTTTGCAATGCTCTCCCTGGGGACCAAGGCTGACACTCACGATGAAATCCTGGAGGGCCTGAATTTCAACCTCACGGAGATTCCGGAGGCTCAGATCCATGAAGGCTTCCAGGAACTCCTCCGTACCCTCAACCAGCCAGACAGCCAGCTCCAGCTGACCACCGGCAATGGCCTGTTCCTCAGCGAGGGCCTGAAGCTAGTGGATAAGTTTTTGGAGGATGTTAAAAAGTTGTACCACTCAGAAGCCTTCACTGTCAACTTCGGGGACACCGAAGAGGCCAAGAAACAGATCAACGATTACGTGGAGAAGGGTACTCAAGGGAAAATTGTGGATTTGGTCAAGGAGCTTGACAGAGACACAGTTTTTGCTCTGGTGAATTACATCTTCTTTAAAG

Excercise 3

Constraint gRNA Sequence by setting gRNA.pattern to require or exclude specific features within the target site.

3a. Synthesis of gRNAs in vivo from host U6 promoters is more efficient if the first base is guanine. To maximize the efficiency, what can we set gRNA.pattern?

3b. Synthesis of gRNAs in vitro using T7 promoters is most efficient when the first two bases are GG. To maximize the efficiency, what can we set gRNA.pattern?

3c. Five consecutive uracils in any position of a gRNA will affect transcription elongation by RNA polymerase III. To avoid premature termination during gRNA synthesis using U6 promoter, what can we set gRNA.pattern?

3d. Some studies have identified sequence features that broadly correlate with lower nuclease cleavage activity, such as uracil in the last 4 positions of the guide sequence. To avoid uracil in these positions, what can we specify gRNA.pattern?

Excercise 4

In the examples we went through, we deliberately restricted searching off-targets in chromosome X. If we are interested in genome-wide search, what should we set chromToSearch to?

Excercise 5

Find gRNAs in a paired configration with distance apart between 5 and 15 without performing off-target analysis

Excercise 6

Create a transcriptDB object

Excercise 7

It is known that different CRISPR-cas system uses different PAM sequence, what parameter needs to be reset?

Excercise 8

It is known that different CRISPR-cas system has different gRNA length, what parameter needs to be reset?

Excercise 9

Which parameter needs to be reset to 8 if we are interested in finding gRANs with restriction enzyme pattern of size 8 or above?

Excercise 10

New penalty matrix has been recently derived, which parameter needs to be set accordingly?

Excercise 11

It has been shown that although PAM sequence NGG is preferred, a variant NAG is also recognized with less effecieny. The researcher is interested in performing off-target searching to include both NGG and NAG variants, but requiring that gRNAs must precede NGG. What parameter(s) need to be set correctly to carry such a search?

Excercise 12

Could you think of any other use cases?