Welcome to the
amplican package. This vignette will walk you through our main
package usage with example MiSeq dataset. You will learn how to interpret
results, create summary reports and plot deletions, insertions and mutations
with our functions.
amplican, is created for fast and precise analysis of CRISPR
amplican creates reports of deletions, insertions, frameshifts, cut rates and other metrics in knitable to HTML style.
amplican uses many
CRAN packages, and is high level package with purpose to align your fastq samples and automate analysis across different experiments.
amplican maintains elasticity through configuration file, which, with your fastq samples are the only requirements.
For those inpatient of you, who want to see an example of our whole pipeline analysis on attached example data look here. Below you will find the conceptual map of amplican.
Below you will find the
amplicanConsensus rules. That allow you to understand how ampliCan treats unambiguous forward and reverse reads. Green color indicates events that will be accepted. When forward and reverse reads agree, their events are in the same place and span the same length, we will take forward read event as representative. In case when events from forward and reverse read don’t agree we select event from strand with higher alignment score. In situation where one of the reads is not spanning event in question we consider this event as real (as we don’t have other information). If both strands cover event in question, but one strand has no indel,
amplicanConsensus will change behavior according to the
To successfully run our analysis it is mandatory to have a configuration file. Take a look at our example:
Configuration file should be a “,” delimited csv file with information about your experiments. You can find example config file path by running:
system.file("extdata", "config.csv", package = "amplican")
Columns of the config file:
amplicanwill estimate HDR efficiency based on the events from aligning donor and amplicon sequences, donor and reads and reads and amplicon.
If you have only forward primers leave column Reverse_Primer empty, leave empty also the Reverse_Reads column. You can still use amplican like normal.
amplican with default options, along with generation of all posible reports you can use
amplicanPipeline function. We have already attached results of the default amplican analysis (look at other vignettes) of the example dataset, but take a look below at how you can do that yourself. Be prepared to grab a coffe when running
knit_files = TRUE as this will take some time. You will understand it is worth waiting when reports will be ready.
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") # output folder, a full path results_folder <- tempdir() # run amplican amplicanPipeline(config, fastq_folder, results_folder) # results of the analysis can be found at message(results_folder)
Take a look into “results_folder” folder. Here you can find
.Rmd files that are our reports for example dataset. We already crafted
.html versions and you can find them in the “reports” folder. Open one of the reports with your favourite browser now. To zoom on an image just open it in new window (right click -> open image in new tab).
amplicanPipeline just crafted very detailed reports for you, but this is not all, if you need something different e.g. different plot colours, just change the
.Rmd file and
knit it again. This way you have all the power over plotting.
First step of our analysis is to filter out reads that are not complying with our default restrictions:
This table is also summarized in one of the reports. As you can see for our example dataset we have two barcodes, to which correspond 21 and 20 reads. Six reads are rejected for barcode_1 due to bad alphabet and bad average quality. Each of the barcodes has unique reads, which means forward and reverse reads are compacted when they are identical. There is 8 and 9 unique reads for each barcode. One read failed with assignment for barcode_1, you can see this read in the top unassgned reads for barcode report in human readable form. Normally you will probably see only half of your reads being assigned to the barcodes. Reads are assigned when for forward read we can find forward primer and for reverse read we can find reverse primer. Primers have to be perfectly matched. Nevertheless, there is option
fastqreads = 0.5 which changes method of assigning reads to each IDs. With this option specified only one of the reads (forward or reverse) have to have primer perfectly matched. If you don’t have the reverse reads or you don’t want to use them you can use option
fastqreads = 1, this option should be detectd autmatically, if you leave empty field Reverse_Primer in the config file.
config_summary.csv contains extended version of the config file. It should provide you additional look at raw numbers which we use for various plots in reports. Take a look at example extension:
amplicanPipeline these columns are added to the config file:
File RunParameters.txt lists all options used for the analysis, this file you might find useful when reviewing analysis from the past where you forgot what kind of options you used. Other than that this file has no purpose.
# path to example RunParameters.txt run_params <- system.file("extdata", "results", "RunParameters.txt", package = "amplican") # show contents of the file readLines(run_params)
##  "Config file: full/path/to/config/file/that/has/been/used.csv" ##  "Average Quality: 30" ##  "Minimum Quality: 0" ##  "Write Alignments: txt" ##  "Fastq files Mode: 0" ##  "Gap Opening: 25" ##  "Gap Extension: 0" ##  "Consensus: TRUE" ##  "Normalize: guideRNA, Group" ##  "PRIMER DIMER buffer: 30" ##  "Cut buffer: 5" ##  "Scoring Matrix:" ##  ",A,C,G,T" ##  "A,5,-4,-4,-4" ##  "C,-4,5,-4,-4" ##  "G,-4,-4,5,-4" ##  "T,-4,-4,-4,5"
As name indicates it contains all alignments.
# path to the example alignments folder system.file("extdata", "results", "alignments", package = "amplican")
unassigned_reads.csv you can find detailed information about unassigned reads. In example dataset there is one unassigned read.
Take a look at the alignment events file which contains all the insertions, deletions, cuts and mutations. This file can be used in various ways. Examples you can find in
.Rmd files we prepare using
amplicanReport. These can be easily converted into
GRanges and used for further analysis! Events are saved at three points of
First file “raw_events.csv” contains all events directly extracted from aligned reads.
After filtering PRIMER DIMER reads, removing events overlapping primers (alignment artifacts)
and shifting events so that they are relative to the expected cut sites “events_filtered_shifted.csv” is saved. After normalization through
“events_filtered_shifted_normalized.csv” is saved, probably it is the file you should use
for further analysis.
Human readable alignments can be accesed using
lookupAlignment function of