methylscaper is an R package for visualizing data that jointly profile endogenous methylation and chromatin accessibility (MAPit, NOMe-seq, scNMT-seq, nanoNOMe, etc.). The package offers pre-processing for single-molecule data and accepts input from Bismark (or similar alignment programs) for single-cell data. A common interface for visualizing both data types is done by generating ordered representational methylation-state matrices. The package provides a Shiny app to allow for interactive and optimal ordering of the individual DNA molecules to discover methylation patterns and nucleosome positioning.
Note: If you use methylscaper in your research, please cite our manuscript on bioRxiv.
If, after reading this vignette you have questions, please submit your question on GitHub: Question or Report Issue. This will notify the package maintainers and benefit other users.
For local use of
methylscaper, it can be installed into R from Bioconductor (using R version >= 4.1.0):
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("methylscaper")
methylscaper can also be installed via GitHub. The R4.0 branch is current with the Bioconductor version and only requires R version >= 4.0.0 while 4.1 is still under development or for those who have not yet upgraded.
if (!requireNamespace("devtools", quietly=TRUE)) install.packages("devtools") devtools::install_github("rhondabacher/methylscaper", ref="R4.0")
After successful installation, load the package into the working space.
To access the Shiny app, simply run:
For visualizing single-cell data from methods such as scNMT-seq, methylscaper begins with pre-aligned data. For each cell, there should be two files, one for the GCH sites and another for the HCG sites. The minimal number of columns needed for methylscaper is three: chromosome, position, and methylation status. This type of file is generated via the “Bismark_methylation_extractor” script in the Bismark software tool. The extractor function outputs files in four or six column output files (see bedGraph option described here: http://felixkrueger.github.io/Bismark/Docs/). Methylscaper will accept these and convert to the three column format internally.
Due to the large file size, methylscaper further processes the data for the visualization analysis to the chromosome level. In the Shiny app, first select all files associated with the endogenous methylation and then select all files associated with accessibility. The files should be named in such a way that the file pairs can be inferred (e.g “Expr1_Sample1_met” pairs with “Expr1_Sample1_acc”). Finally, indicate the desired chromosome to filter to the chromosome level.
Below we walk through an example using data from Clark et al., 2018, obtained from
the sake of this example, we assume that the
directory is downloaded to
In the screenshot below, the data from GSE10926 data on chromosome 19 is ready for processing. When selecting “Browse…”, be sure to select all relevant files for each methylation type.
The preprocessing can also be done in the R console directly, which allows for additional start and end specifications. For the purpose of creating a small example to include in the package, we additionally restricted the data between base pairs 8,947,041 to 8,987,041, which is centered around the Eef1g gene. In practice, we advise users to filter to just the chromosome level to keep the region relatively large. The Visualization tab allows for a more refined search along the chromosome and is described in a section below.
When using methylscaper within R, rather than specifying all the files individually, simply point to a folder which contains two subfolders with the accessibility and endogenous methylation files. These subfolders must be named “acc” and “met”, respectively.
filepath <- "~/Downloads/GSE109262_RAW/" singlecell_subset <- subsetSC(filepath, chromosome=19, startPos = 8937041, endPos = 8997041) # To save for later, save as an rds file and change the folder location as desired: saveRDS(singlecell_subset, "~/Downloads/singlecell_subset.rds")
For a reproducible example, we have provided three cells for download on http://methylscaper.com/content/exampledata.html, and below we run an example where we read the data directly from the URL’s into R and use the subsetSC function. If you choose to download these files, then the directions above should be followed by moving the files into subfolders named “acc” and “met”.
gse_subset_path <- list(c(("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936197_ESC_A08_CpG-met_processed.tsv.gz"), ("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936196_ESC_A07_CpG-met_processed.tsv.gz"), ("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936192_ESC_A03_CpG-met_processed.tsv.gz")), c(("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936197_ESC_A08_GpC-acc_processed.tsv.gz"), ("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936196_ESC_A07_GpC-acc_processed.tsv.gz"), ("http://methylscaper.com/data/GSE109262_SUBSET/GSM2936192_ESC_A03_GpC-acc_processed.tsv.gz")), c(("GSM2936197_ESC_A08_CpG-met_processed"), ("GSM2936196_ESC_A07_CpG-met_processed"), ("GSM2936192_ESC_A03_CpG-met_processed")), c(("GSM2936197_ESC_A08_GpC-acc_processed"), ("GSM2936196_ESC_A07_GpC-acc_processed"), ("GSM2936192_ESC_A03_GpC-acc_processed"))) # This formatting is a list of 4 objects: the met file urls, the acc file urls, the met file names, and the acc file names. singlecell_subset <- subsetSC(gse_subset_path, chromosome=19, startPos = 8937041, endPos = 8997041) # To save for later, save as an rds file and change the folder location as desired: # saveRDS(singlecell_subset, "~/Downloads/singlecell_subset.rds")
To fully demonstrate the example using the three cells subset, we skip some explanations of the functions and show the resulting plot. For this particular region only one of the three cells has coverage and thus only one row is shown in the plot (if a cell has no data in the entire region then it is not shown in the plot rather than being plot as missing data). All functions are further explained in detail in the following sections.
data("mouse_bm") gene.select <- subset(mouse_bm, mgi_symbol == "Eef1g") startPos <- 8966841 endPos <- 8967541 prepSC.out <- prepSC(singlecell_subset, startPos=startPos, endPos=endPos) orderObj <- initialOrder(prepSC.out) plotSequence(orderObj, Title = "Eef1g gene", plotFast=TRUE, drawKey = FALSE)