1 Introduction

1.1 What is MIRit

MIRit (miRNA integration tool) is an open source R package that aims to facilitate the comprehension of microRNA (miRNA) biology through the integrative analysis of gene and miRNA expression data deriving from different platforms, including microarrays, RNA-Seq, miRNA-Seq, proteomics and single-cell transcriptomics. Given their regulatory importance, a complete characterization of miRNA dysregulations results crucial to explore the molecular networks that may lead to the insurgence of complex diseases. Unfortunately, there are no currently available options for thoroughly interpreting the biological consequences of miRNA dysregulations, thus limiting the ability to identify the affected pathways and reconstruct the compromised molecular networks. To tackle this limitation, we developed MIRit, an all-in-one framework that provides flexible and powerful methods for performing integrative miRNA-mRNA multi-omic analyses from start to finish. In particular, MIRit includes multiple modules that allow to perform:

  1. Differential expression analysis, which identifies miRNAs and genes that vary across biological conditions for both RNA-Seq and microarray experiments (even though other technologies can also be used);
  2. Functional enrichment analysis, which allows to understand the consequences of gene dysregulations through different strategies, including over-representation analysis (ORA), gene-set enrichment analysis (GSEA) and Correlation Adjusted MEan RAnk gene set test (CAMERA);
  3. SNP association, that links miRNA dysregulations to disease-associated SNPs occurring at miRNA gene loci;
  4. Target identification, which retrieves predicted and validated miRNA-target interactions through innovative state-of-the-art approaches that limit false discoveries;
  5. Expression levels integration, which integrates the expression levels of miRNAs and mRNAs for both paired data, through correlation analyses, and unpaired data, through association tests and rotation gene-set tests;
  6. Topological analysis, which implements a novel approach called Topology-Aware Integrative Pathway Analysis (TAIPA) for identifying the impaired molecular networks that affect biological pathways retrieved from KEGG, Reactome and WikiPathways.

1.2 How to cite MIRit

If you use MIRit in published research, please cite the following paper:

Ronchi J and Foti M. ‘MIRit: an integrative R framework for the identification of impaired miRNA-mRNA regulatory networks in complex diseases’. bioRxiv (2023). doi:10.1101/2023.11.24.568528

This package internally uses different R/Bioconductor packages, remember to cite the appropriate publications.

1.3 Installation

Before starting, MIRit must be installed on your system. You can do this through Bioconductor.

## install MIRit from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("MIRit")

If needed, you could also install the development version of MIRit directly from GitHub:

## install the development version from GitHub
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("jacopo-ronchi/MIRit")

When MIRit is installed, we can load it through:

## load MIRit
library(MIRit)

2 Data preparation

2.1 Load example data

To demonstrate the capabilities of MIRit we will use RNA-Seq data from Riesco-Eizaguirre et al. (2015). This experiment collected samples from 8 papillary thyroid carcinoma tumors and contralateral normal thyroid tissue from the same patients. These samples were profiled for gene expression through RNA-Seq, and for miRNA expression through miRNA-Seq. To provide easy access to the user, raw count matrices have been retrieved from GEO and included in this package.

To load example data, we can simply use the data() function:

## load count matrix for genes
data(geneCounts, package = "MIRit")

## load count matrix for miRNAs
data(mirnaCounts, package = "MIRit")

2.2 Paired vs unpaired data

When using MIRit, we must specify whether miRNA and gene expression values derive from the same individuals or not. As already mentioned, paired data are those where individuals used to measure gene expression are the same subjects used to measure miRNA expression. On the other hand, unpaired data are those where gene expression and miRNA expression derive from different cohorts of donors. Importantly, MIRit considers as paired samples those data sets where paired measurements are available for at least some samples.

In our case, miRNA and gene expression data originate from the same subjects, and therefore we will conduct a paired samples analysis.

2.3 Set up expression matrices

As input data, MIRit requires miRNA and gene expression measurements as matrices with samples as columns, and genes/miRNAs as rows. Further, the row names of miRNA expression matrix should contain miRNA names according to miRBase nomenclature (e.g. hsa-miR-151a-5p, hsa-miR-21-5p), whereas for gene expression matrix, row names must contain gene symbols according to HGNC (e.g. TYK2, BDNF, NTRK2).

These matrices may handle different types of values deriving from multiple technologies, including microarrays, RNA-Seq and proteomics. The only requirement is that, for microarray studies, expression matrices must be normalized and log2 transformed. This can be achieved by applying the RMA algorithm implemented in the oligo (Carvalho and Irizarry 2010) package, or by applying other quantile normalization strategies. On the contrary, for RNA-Seq and miRNA-Seq experiments, the simple count matrix must be supplied.

Eventually, expression matrices required by MIRit should appear as those in mirnaCounts and geneCounts, which are displayed in Tables 1 and 2.

Table 1: Gene expression matrix
An expression matrix containing normalized values, with row names according to hgnc.
PTC 1 PTC 2 PTC 3 PTC 4 PTC 5
A1BG 7 6 9 12 7
A1BG-AS1 20 8 22 6 16
A1CF 0 0 0 1 1
A2M 9 11 9 37 18
A2M-AS1 1486 722 801 968 1787
Table 2: MiRNA expression matrix
An expression matrix containing normalized values, with row names according to miRBase.
PTC 1 PTC 2 PTC 3 PTC 4 PTC 5
hsa-let-7a-2-3p 3 0 9 1 4
hsa-let-7a-3p 472 82 228 122 313
hsa-let-7a-5p 141101 45543 105598 45503 159598
hsa-let-7b-3p 412 81 120 147 164
hsa-let-7b-5p 16337 6586 8121 7993 16516

2.4 Define sample metadata

Once we have expression matrices in the proper format, we need to inform MIRit about the samples in study and the biological conditions of interest. To do so, we need to create a data.frame that must contain:

  • a column named primary, specifying a unique identifier for each different subject;
  • a column named mirnaCol, matching the column name used for each sample in the miRNA expression matrix;
  • a column named geneCol, matching the column name used for each sample in the gene expression matrix;
  • a column that defines the biological condition of interest;
  • other optional columns that store specific sample metadata, such as age, sex and so on…

Firstly, let’s take a look at the column names used for miRNA and gene expression matrices.

## print sample names in geneCounts
colnames(geneCounts)
#>  [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"

## print sample names in mirnaCounts
colnames(mirnaCounts)
#>  [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"

## check if samples in geneCounts are equal to those in mirnaCounts
identical(colnames(geneCounts), colnames(mirnaCounts))
#> [1] TRUE

In our case, we see that both expression matrices have the same column names, and therefore mirnaCol and geneCol will be identical. However, note that is not always the case, especially for unpaired data, where miRNA and gene expression values derive from different subjects. In these cases, mirnaCol and geneCol must map each column of miRNA and gene expression matrices to the relative subjects indicated in the primary column. Notably, for unpaired data, NAs can be used for missing entries in mirnaCol/geneCol.

That said, we can proceed to create the data.frame with sample metadata as follows.

## create a data.frame containing sample metadata
meta <- data.frame(primary = colnames(mirnaCounts),
                   mirnaCol = colnames(mirnaCounts),
                   geneCol = colnames(geneCounts),
                   disease = c(rep("PTC", 8), rep("NTH", 8)),
                   patient = c(rep(paste("Sample_", seq(8), sep = ""), 2)))

2.5 Create a MirnaExperiment object

At this point, we need to create an object of class MirnaExperiment, which is the main class used in MIRit for integrative miRNA-mRNA analyses. In particular, this class extends the MultiAssayExperiment class from the homonym package (Ramos et al. 2017) to store expression levels of both miRNAs and genes, differential expression results, miRNA-target pairs and integrative miRNA-gene analysis.

The easiest way to create a valid MirnaExperiment object is to use the MirnaExperiment() function, which automatically handles the formatting of input data and the creation of the object.

## create the MirnaExperiment object
experiment <- MirnaExperiment(mirnaExpr = mirnaCounts,
                              geneExpr = geneCounts,
                              samplesMetadata = meta,
                              pairedSamples = TRUE)

3 Differential expression analysis

Now that the MirnaExperiment object has been created, we can move to differential expression analysis, which aims to define differentially expressed features across biological conditions.

3.1 Visualize expression variability

Firstly, before doing anything else, it is useful to explore expression variability through dimensionality reduction techniques. This is useful because it allows us to visualize the main drivers of expression differences. In this regard, MIRit offers the plotDimensions() function, which enables to visualize both miRNA and gene expression in the multidimensional space (MDS plots). Moreover, it is possible to color samples based on specific variables, hence allowing to explore specific patterns between distinct biological groups.

In our example, let’s examine expression variability for both miRNAs and genes, and let’s color the samples based on “disease”, a variable included in the previously defined metadata.

geneMDS <- plotDimensions(experiment,
                          assay = "genes",
                          condition = "disease",
                          title = "MDS plot for genes")

mirnaMDS <- plotDimensions(experiment,
                           assay = "microRNA",
                           condition = "disease",
                           title = "MDS plot for miRNAs")

ggpubr::ggarrange(geneMDS, mirnaMDS,
                  nrow = 1, labels = "AUTO", common.legend = TRUE)