epiregulon.extra 1.1.4
epiregulon.extra is a companion package to epiregulon and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis.
For full documentation of the epiregulon
package, please refer to the epiregulon book.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("epiregulon.extra")
To continue with the visualization functions, we will first need the gene expression matrix. We can download the data from scMultiome.
# load the MAE object
library(scMultiome)
## Loading required package: AnnotationHub
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: ExperimentHub
## Loading required package: MultiAssayExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
##
## cache
## The following object is masked from 'package:AnnotationHub':
##
## cache
## Loading required package: SingleCellExperiment
mae <- scMultiome::reprogramSeq()
## see ?scMultiome and browseVignettes('scMultiome') for documentation
## loading from cache
# expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name
reducedDim(GeneExpressionMatrix, "UMAP_Combined") <-
reducedDim(mae[["TileMatrix500"]], "UMAP_Combined")
# arrange hash_assignment
GeneExpressionMatrix$hash_assignment <- factor(as.character(
GeneExpressionMatrix$hash_assignment),
levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
"HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2"
)
)
Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object.
library(epiregulon)
library(epiregulon.extra)
data(regulon)
score.combine <- calculateActivity(
expMatrix = GeneExpressionMatrix,
regulon = regulon,
mode = "weight",
method = "weightedMean",
exp_assay = "normalizedCounts",
normalize = FALSE
)
## calculating TF activity from regulon using weightedmean
## Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
## regulon, : The weight column contains multiple subcolumns but no cluster
## information was provided. Using first column to compute activity...
## aggregating regulons...
## creating weight matrix...
## calculating activity scores...
## normalize by the number of targets...
We perform a differential analysis of transcription factor activity across groups of cells. This function is a wrapper around findMarkers
from scran.
library(epiregulon.extra)
markers <- findDifferentialActivity(
activity_matrix = score.combine,
clusters = GeneExpressionMatrix$hash_assignment,
pval.type = "some",
direction = "up",
test.type = "t",
logvalue = FALSE
)
We can specify the top transcription factors of interest
markers.sig <- getSigGenes(markers, topgenes = 5)
## Using a logFC cutoff of 1 for class HTO10_GATA6_UTR for direction equal to any
## Using a logFC cutoff of 1 for class HTO2_GATA6_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO8_NKX2.1_UTR for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO3_NKX2.1_v2 for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO1_FOXA2_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO4_mFOXA1_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO6_hFOXA1_UTR for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO5_NeonG_v2 for direction equal to any
We visualize the known differential transcription factors by bubble plot
plotBubble(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"),
clusters = GeneExpressionMatrix$hash_assignment
)
Then visualize the most differential transcription factors by clusters
plotBubble(
activity_matrix = score.combine,
tf = markers.sig$tf,
clusters = GeneExpressionMatrix$hash_assignment
)
Visualize the known differential transcription factors by violin plot.
plotActivityViolin(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
clusters = GeneExpressionMatrix$hash_assignment
)