The Bioconductor build system does not have the MEME Suite installed, therefore these vignettes will not contain any R output. To view the full vignette, visit this article page on the memes website at this link
TomTom is a tool for comparing motifs to a known set of motifs. It takes as input a set of motifs and a database of known motifs to return a ranked list of the significance of the match between the input and known motifs. TomTom can be run using the
runTomTom() can accept a variety of inputs to use as the “known” motif database. The formats are as follows: - a path to a .meme format file (eg
"fly_factor_survey.meme") - a list of universalmotifs - the output object from
runDreme() - a
list() of all the above. If entries are named,
runTomTom() will use those names as the database identifier
memes can be configured to use a default .meme format file as the query database, which it will use if the user does not provide a value to
database when calling
runTomTom(). The following locations will be searched in order:
meme_dboption, defined using
options(meme_db = "path/to/database.meme")
meme_dboption can also be set to an R object, like a universalmotif list.
MEME_DBenvironment variable defined in
MEME_DBvariable will only accept a path to a .meme file
NOTE: if an invalid location is found at one option,
runTomTom() will fall back to the next location if valid (eg if the
meme_db option is set to an invalid file, but the
MEME_DB environment variable is a valid file, the
MEME_DB path will be used.
To use TomTom on existing motifs,
runTomTom() will accept any motifs in
universalmotif format. The
universalmotif package provides several utilities for importing data from various sources.
runTomTom() can also take the output of
runDreme as input. This allows users to easily discover denovo motifs, then match them to as set of known motifs. When run on the output of
runTomTom() output columns will be appended to the
runDreme() output data.frame, so no information will be lost.
When run using a
universalmotif object as input,
runTomTom returns the following columns:
names(example_tomtom) #>  "motif" "name" "altname" #>  "family" "organism" "consensus" #>  "alphabet" "strand" "icscore" #>  "nsites" "bkgsites" "pval" #>  "qval" "eval" "type" #>  "bkg" "best_match_name" "best_match_altname" #>  "best_db_name" "best_match_offset" "best_match_pval" #>  "best_match_eval" "best_match_qval" "best_match_strand" #>  "best_match_motif" "tomtom"
Columns preappended with
best_ indicate the data corresponding to the best match to the motif listed in
tomtom column is a special column which contains a nested
data.frame of the rank-order list of TomTom hits for the motif listed in
best_match_motif column contains the universalmotif representation of the best match motif.
match_motif column of
tomtom contains the universalmotif format motif from the database corresponding to each match in descending order.
drop_best_match() function drops all the
best_match_* columns from the
To unnest the
tomtom data.frame column, use
drop_best_match() function can be useful when doing this to clean up the unnested data.frame.
unnested <- example_tomtom %>% drop_best_match() %>% tidyr::unnest(tomtom) names(unnested) #>  "motif" "name" "altname" "family" #>  "organism" "consensus" "alphabet" "strand" #>  "icscore" "nsites" "bkgsites" "pval" #>  "qval" "eval" "type" "bkg" #>  "match_name" "match_altname" "match_motif" "db_name" #>  "match_offset" "match_pval" "match_eval" "match_qval" #>  "match_strand"
To re-nest the tomtom results, use
nest_tomtom() (Note: that
best_match_ columns will be automatically updated based on the rank-order of the
unnested %>% nest_tomtom() %>% names #>  "name" "altname" "family" #>  "organism" "consensus" "alphabet" #>  "strand" "icscore" "nsites" #>  "bkgsites" "pval" "qval" #>  "eval" "type" "bkg" #>  "best_match_name" "best_match_altname" "best_match_motif" #>  "best_db_name" "best_match_offset" "best_match_pval" #>  "best_match_eval" "best_match_qval" "best_match_strand" #>  "motif" "tomtom"
While TomTom can be useful for limiting the search-space for potential true motif matches, often times the default “best match” is not the correct assignment. Users should use their domain-specific knowledge in conjunction with the data returned by TomTom to make this judgement (see below for more details). memes provides a few convenience functions for reassigning these values.
update_best_match() function will update the values of the
best_match* columns to reflect the values stored in the first row of the
tomtom data.frame entry. This means that the rank of the
tomtom data is meaningful, and users should only manipulate it if intending to create side-effects.
If the user can force motifs to contain a certain motif as their best match using the
force_best_match() takes a named vector as input, where the name corresponds to the input motif
name, and the value corresponds to a
match_name found in the
tomtom list data (NOTE: this means that users cannot force the best match to be a motif that TomTom did not return as a potential match).
For example, below the example motif could match either “Eip93F_SANGER_10”, or “Lag1_Cell”.
example_tomtom$tomtom[] %>% head(3) #> match_name match_altname #> 1 Eip93F_SANGER_10 Eip93F #> 2 Lag1_Cell schlank #> 3 pho_SOLEXA_5 pho #> match_motif #> 1 <S4 class 'universalmotif' [package "universalmotif"] with 20 slots> #> 2 <S4 class 'universalmotif' [package "universalmotif"] with 20 slots> #> 3 <S4 class 'universalmotif' [package "universalmotif"] with 20 slots> #> db_name match_offset match_pval match_eval match_qval #> 1 flyFactorSurvey_cleaned 4 8.26e-07 0.000459 0.000919 #> 2 flyFactorSurvey_cleaned 3 1.85e-03 1.030000 0.781000 #> 3 flyFactorSurvey_cleaned 1 2.54e-03 1.410000 0.781000 #> match_strand #> 1 + #> 2 + #> 3 +
The current best match is listed as “Eip93F_SANGER_10”.
To force “example_motif” to have the best match as “Lag1_Cell”, do the following:
best_match_* columns will be updated to reflect the modifications.
view_tomtom_hits() can be used to compare the hits from tomtom to each input motif. Hits are shown in descending order by rank. By default, all hits are shown, or the user can pass an integer to
top_n to view the top number of motifs. This can be a useful plot for determining which of the matches appear to be the “best” hit.
For example, it appears that indeed “Eip93F_SANGER_10” is the best of the top 3 hits, as most of the matching sequences in the “Lag1_Cell” and “pho_SOLEXA_5” motifs correspond to low information-content regions of the matched motifs.
importTomTomXML() can be used to import a
tomtom.xml file from a previous run on the MEME server or on the commandline. Details for how to save data from the TomTom webserver are below.
To download XML data from the MEME Server, right-click the TomTom XML output link and “Save Target As” or “Save Link As” (see example image below), and save as
<filename>.xml. This file can be read using
memes is a wrapper for a select few tools from the MEME Suite, which were developed by another group. In addition to citing memes, please cite the MEME Suite tools corresponding to the tools you use.
If you use
runTomTom() in your analysis, please cite:
Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, “Quantifying similarity between motifs”, Genome Biology, 8(2):R24, 2007. full text
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
sessionInfo() #> R version 4.2.0 RC (2022-04-21 r82226) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 20.04.4 LTS #> #> Matrix products: default #> BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so #> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so #> #> locale: #>  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #>  LC_TIME=en_GB LC_COLLATE=C #>  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #>  LC_PAPER=en_US.UTF-8 LC_NAME=C #>  LC_ADDRESS=C LC_TELEPHONE=C #>  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #>  stats graphics grDevices utils datasets methods base #> #> other attached packages: #>  universalmotif_1.15.0 magrittr_2.0.3 memes_1.5.2 #> #> loaded via a namespace (and not attached): #>  Rcpp_126.96.36.199 tidyr_1.2.0 Biostrings_2.65.0 #>  ggseqlogo_0.1 assertthat_0.2.1 rprojroot_2.0.3 #>  digest_0.6.29 utf8_1.2.2 R6_2.5.1 #>  GenomeInfoDb_1.33.2 stats4_4.2.0 evaluate_0.15 #>  highr_0.9 ggplot2_3.3.6 pillar_1.7.0 #>  zlibbioc_1.43.0 rlang_1.0.2 jquerylib_0.1.4 #>  S4Vectors_0.35.0 R.utils_2.11.0 R.oo_1.24.0 #>  rmarkdown_2.14 desc_1.4.1 readr_2.1.2 #>  stringr_1.4.0 cmdfun_1.0.2 RCurl_1.98-1.6 #>  munsell_0.5.0 compiler_4.2.0 xfun_0.30 #>  pkgconfig_2.0.3 BiocGenerics_0.43.0 htmltools_0.5.2 #>  tidyselect_1.1.2 tibble_3.1.7 GenomeInfoDbData_1.2.8 #>  IRanges_2.31.0 matrixStats_0.62.0 fansi_1.0.3 #>  crayon_1.5.1 dplyr_1.0.9 tzdb_0.3.0 #>  withr_2.5.0 MASS_7.3-57 bitops_1.0-7 #>  brio_1.1.3 R.methodsS3_1.8.1 waldo_0.4.0 #>  grid_4.2.0 jsonlite_1.8.0 gtable_0.3.0 #>  lifecycle_1.0.1 DBI_1.1.2 scales_1.2.0 #>  cli_3.3.0 stringi_1.7.6 farver_2.1.0 #>  XVector_0.37.0 testthat_3.1.4 bslib_0.3.1 #>  ellipsis_0.3.2 generics_0.1.2 vctrs_0.4.1 #>  tools_4.2.0 glue_1.6.2 purrr_0.3.4 #>  hms_1.1.1 pkgload_1.2.4 fastmap_1.1.0 #>  yaml_2.3.5 colorspace_2.0-3 GenomicRanges_1.49.0 #>  knitr_1.39 sass_0.4.1