Rqc is an optimized tool designed for quality control and assessment of high-throughput sequencing data. It performs parallel processing of entire files and produces a report which contains a set of high-resolution graphics that can be used for quality assessment.
This version of Rqc produces high-quality images for the following statistics:
The main goal of Rqc is to provide graphical tools for quality
assessment of reads contained in FASTQ files. This package is
designed focusing on simplicity of use. Therefore, the Rqc package
allows the user to call one single function called
method processes a set of input files and generates an HTML report
containing several plots that can be used for quality assessment.
To access this functionality, the user needs to load Rqc package.
The next step is to determine the location of the FASTQ files that should be analyzed. The example below, uses sample files provided by the ShortRead package, but the user must modify this location accordingly, in order to reflect the actual location of the files that need QA.
folder <- system.file(package="ShortRead", "extdata/E-MTAB-1147")
The basic usage of the
rqc function requires the definition of 2
path, is the location where the files of interest
are saved at (this was defined on the step above). The other argument,
pattern, is a regular expression that identifies all files of
interest. Below, we use
.fastq.gz to specify that all files
containing that string are to be processed.
rqc(path = folder, pattern = ".fastq.gz")
At this point, the user’s default Internet browser will open an HTML file. This file is the report generated by Rqc, which, by default, is stored in a temporary directory. A sample report is shown below:
This table describes input files.
reads column can be total number of reads (
sample=FALSE) or sample size.
This plot describe an overview of per read mean quality distribution of all files
This plot describes the average quality pattern by showing on the X-axis quality thresholds and on the Y-axis the percentage of reads that exceed that quality level.
This describes the average quality scores for each cycle of sequencing.