Note: if you use MAGeCKFlute in published research, please cite: Binbin Wang, Mei Wang, Wubing Zhang. “Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute.” Nature Protocols (2019), doi: 10.1038/s41596-018-0113-7.

How to get help for MAGeCKFlute

Any and all MAGeCKFlute questions should be posted to the Bioconductor support site, which serves as a searchable knowledge base of questions and answers:

https://support.bioconductor.org

Posting a question and tagging with “MAGeCKFlute” will automatically send an alert to the package authors to respond on the support site. See the first question in the list of Frequently Asked Questions (FAQ) for information about how to construct an informative post.

You can also email your question to the package authors.

Input data

MAGeCK results

MAGeCK (Wei Li and Liu. 2014) and MAGeCK-VISPR (Wei Li and Liu. 2015) are developed by our lab previously, to analyze CRISPR/Cas9 screen data in different scenarios(Tim Wang 2014, Hiroko Koike-Yusa (2014), Ophir Shalem1 (2014), Luke A.Gilbert (2014), Silvana Konermann (2015)). Both algorithms use negative binomial models to model the variances of sgRNAs, and use Robust Rank Aggregation (for MAGeCK) or maximum likelihood framework (for MAGeCK-VISPR) for a robust identification of selected genes.

The command mageck mle computes beta scores and the associated statistics for all genes in multiple conditions. The beta score describes how the gene is selected: a positive beta score indicates a positive selection, and a negative beta score indicates a negative selection.

The command mageck test uses Robust Rank Aggregation (RRA) for robust identification of CRISPR-screen hits, and outputs the summary results at both sgRNA and gene level.

Customized matrix input

FluteMLE: A matrix contains columns of ‘Gene’, .beta and .beta which corresponding to the parameter and . FluteRRA: A matrix contains columns of “id”, “neg.goodsgrna”, “neg.lfc”, “neg.fdr”, “pos.goodsgrna”, and “pos.fdr”.

Quick start

Here we show the most basic steps for integrative analysis pipeline. MAGeCKFlute package provides several example data, including countsummary, rra.gene_summary, rra.sgrna_summary, and mle.gene_summary, which are generated by running MAGeCK. We will work with them in this document.

Downstream analysis pipeline for MAGeCK RRA

All pipeline results are written into local directory “./RRA_Flute_Results/”, and all figures are integrated into file “RRA_Flute.rra_summary.pdf”.

Downstream analysis pipeline for MAGeCK MLE

All pipeline results are written into local directory “./MLE_Flute_Results/”, and all figures are integrated into file “MLE_Flute.mle_summary.pdf”.

Section I: Quality control

** Count summary ** MAGeCK Count in MAGeCK/MAGeCK-VISPR generates a count summary file, which summarizes some basic QC scores at raw count level, including map ratio, Gini index, and NegSelQC. Use function ‘data’ to load the dataset, and have a look at the file with a text editor to see how it is formatted.

##                                   File    Label    Reads   Mapped
## 1 ../data/GSC_0131_Day23_Rep1.fastq.gz day23_r1 62818064 39992777
## 2  ../data/GSC_0131_Day0_Rep2.fastq.gz  day0_r2 47289074 31709075
## 3  ../data/GSC_0131_Day0_Rep1.fastq.gz  day0_r1 51190401 34729858
## 4 ../data/GSC_0131_Day23_Rep2.fastq.gz day23_r2 58686580 37836392
##   Percentage TotalsgRNAs Zerocounts GiniIndex NegSelQC NegSelQCPval
## 1     0.6366       64076         57   0.08510        0            1
## 2     0.6705       64076         17   0.07496        0            1
## 3     0.6784       64076         14   0.07335        0            1
## 4     0.6447       64076         51   0.08587        0            1
##   NegSelQCPvalPermutation NegSelQCPvalPermutationFDR NegSelQCGene
## 1                       1                          1            0
## 2                       1                          1            0
## 3                       1                          1            0
## 4                       1                          1            0

Section II: Downstream analysis of MAGeCK RRA

For experiments with two experimental conditions, we recommend using MAGeCK-RRA to identify essential genes from CRISPR/Cas9 knockout screens and tests the statistical significance of each observed change between two states. Gene summary file in MAGeCK-RRA results summarizes the statistical significance of positive selection and negative selection. Use function ‘data’ to load the dataset, and have a look at the file with a text editor to see how it is formatted.

##       id num  neg.score neg.p.value  neg.fdr neg.rank neg.goodsgrna
## 1    NF2   4 4.1770e-12  2.9738e-07 0.000275        1             4
## 2 SRSF10   4 4.4530e-11  2.9738e-07 0.000275        2             4
## 3 EIF2B4   8 2.8994e-10  2.9738e-07 0.000275        3             8
## 4  LAS1L   6 1.4561e-09  2.9738e-07 0.000275        4             6
## 5   RPL3  15 2.3072e-09  2.9738e-07 0.000275        5            12
## 6 ATP6V0   7 3.8195e-09  2.9738e-07 0.000275        6             7
##   neg.lfc pos.score pos.p.value pos.fdr pos.rank pos.goodsgrna pos.lfc
## 1 -1.3580   1.00000     1.00000       1    16645             0 -1.3580
## 2 -1.8544   1.00000     1.00000       1    16647             0 -1.8544
## 3 -1.5325   1.00000     1.00000       1    16646             0 -1.5325
## 4 -2.2402   0.99999     0.99999       1    16570             0 -2.2402
## 5 -1.0663   0.95519     0.99205       1    15359             2 -1.0663
## 6 -1.6380   1.00000     1.00000       1    16644             0 -1.6380
##     sgrna  Gene control_count treatment_count control_mean treat_mean
## 1 s_10963 CDKN2 1175.4/1156.7     4110.7/4046      1166.00    4078.30
## 2 s_10959 CDKN2 651.49/647.25   2188.3/3020.6       649.37    2604.40
## 3 s_36798   NF2    8917/21204   5020.7/5127.9     15061.00    5074.30
## 4 s_45763 RAB6A 3375.8/3667.7   372.88/357.79      3521.80     365.33
## 5 s_23611  GPN1 4043.8/4064.2    767.53/853.7      4054.00     810.61
## 6 s_50164   SF1 3657.8/3352.6   453.62/628.28      3505.20     540.95
##       LFC control_var adj_var  score       p.low p.high  p.twosided
## 1  1.8055  1.7417e+02  4531.0 43.266  1.0000e+00      0  0.0000e+00
## 2  2.0022  8.9814e+00  2365.7 40.195  1.0000e+00      0  0.0000e+00
## 3 -1.5693  7.5491e+07 78871.0 35.559 2.9804e-277      1 5.9609e-277
## 4 -3.2655  4.2617e+04 15519.0 25.338 6.1638e-142      1 1.2328e-141
## 5 -2.3208  2.0966e+02 18159.0 24.069 2.6711e-128      1 5.3423e-128
## 6 -2.6937  4.6575e+04 15438.0 23.857 4.2365e-126      1 8.4731e-126
##           FDR high_in_treatment
## 1  0.0000e+00              True
## 2  0.0000e+00              True
## 3 1.2732e-272             False
## 4 1.9748e-137             False
## 5 6.8462e-124             False
## 6 9.0487e-122             False

Negative selection and positive selection

Then, extract “neg.fdr” and “pos.fdr” from the gene summary table.

##   Official     LFC      FDR
## 1      NF2 -1.3580 0.000275
## 2   SRSF10 -1.8544 0.000275
## 3   EIF2B4 -1.5325 0.000275
## 4    LAS1L -2.2402 0.000275
## 5     RPL3 -1.0663 0.000275
## 6   ATP6V0 -1.6380 0.000275

We provide a function VolcanoView to visualize top negative and positive selected genes.