rprimer provides tools for designing degenerate oligos (primers and probes) for sequence variable targets, such as RNA viruses. A multiple DNA sequence alignment is used as input data, and the outputs are presented in data frame format and as dashboard-like plots. The aim of the package is to provide a visual and efficient approach to degenerate oligo design, where sequence conservation analysis forms the basis of the design process.
In this vignette, I describe and demonstrate how to use the package by designing broadly reactive primers and probes for reverse transcription (RT) polymerase chain reaction (PCR)-based amplification and detection of hepatitis E virus (HEV), a sequence variable RNA virus and a common foodborne pathogen.
To install rprimer, start R (version 4.2.0 or higher), and enter the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("rprimer")
For best functionality, it is also recommended to install the Biostrings package :
The oligo and assay design workflow consists of five functions:
The package can be run through a Shiny application (a graphical user interface). To start the application, type
runRprimerApp() from within R upon installing and attaching the package.
The first step is to identify and align target sequences of interest. It is important that the alignment does not contain any poorly aligned sequences and accurately represents the generic variation of the target population.
The file “example_alignment.txt” is provided with the package and contains an alignment of 50 HEV sequences. The file path can be retrieved by:
system.file("extdata", "example_alignment.txt", package = "rprimer")
The alignment must be imported to R using the
readDNAMultipleAlignment() function from Biostrings . The input file can contain from one to several thousand sequences.
Below, I show how to import the file “example_alignment.txt”:
filepath <- system.file("extdata", "example_alignment.txt", package = "rprimer") myAlignment <- readDNAMultipleAlignment(filepath, format = "fasta")
For some applications, there is often an interest in amplifying a specific region of the genome. In such cases, the alignment can be masked so that only the desired oligo binding regions are available for the subsequent design process. This type of masking can be achieved by using
colmask() from Biostrings , as illustrated in the example below:
## Mask everything but position 3000 to 4000 and 5000 to 6000 myMaskedAlignment <- myAlignment colmask(myMaskedAlignment, invert = TRUE) <- c(3000:4000, 5000:6000)
It is also possible to mask specific sequences by using
rowmask(). Gap-rich positions can be hidden with
Once the alignment is imported, a consensus profile can be generated. The consensus profile contains all the information needed for the subsequent design process:
The majority consensus base is the most frequently occurring letter among the letters A, C, G, T and -.
Identity represents the proportion of sequences, among all sequences with a DNA base (A, C, G, T) or gap (-) that has the majority consensus base.
The IUPAC consensus character includes all DNA bases that occur at a relative frequency higher than a user-specified ambiguity threshold. The threshold can range from 0 to 0.2: a value of 0 (default) will catch all the variation, but with the potential downside of generating oligos with high degeneracy. A higher value will capture most of the variation, but with the potential downside of missing less common sequence variants.
Coverage represents the proportion of sequences in the target alignment (among all sequences with a DNA base) that are covered by the IUPAC consensus character.
consensusProfile() has two arguments:
Biostrings::MultipleDNAAlignmentobject (see Data import)
ambiguityThreshold: position wise “detection level” for ambiguous bases. All DNA bases that occur with a proportion higher than the specified value will be included in the IUPAC consensus character. Defaults to
0, can range from
Type the following code to generate a consensus profile from the example alignment below:
myConsensusProfile <- consensusProfile(myAlignment, ambiguityThreshold = 0.05)
Results (row 100-105):
The results can be visualized by using