0. Installation

Please note that ‘rifi’ is only available for Unix based systems. To install this package, start R (>= version “4.2”) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))


I. Introduction

The stability or halflife of bacterial transcripts is often estimated using Rifampicin timeseries data. Rifampicin has the special feature that it prevents the initiation of transcprition, but RNA polymerases which are already elongating are unaffected (Campbell et al. 2001). This has the implication that the RNA concentrations of positions downstream of the transcriptional start site appear unchanged until the last polymerase has passed this point. The result is a delayed exponential decay (Chen et al. 2015), which can be fitted by the following model:

\(c(t,n) = \begin{cases} \frac{\alpha}{\lambda} & \quad \text{if } t < \frac{n}{v}\\ \frac{\alpha}{\lambda} \times e^{-\lambda t} & \quad \text{if } t \geq \frac{n}{v} \end{cases}\)

The model (Chen et al. 2015) consists of two phases; the firts phase describes the delay where the transcript concentration is in its steady state defined by the ratio of the synthesis rate \(\alpha\) and the decay constant \(\lambda\) (\(steadystate = \frac{\alpha}{\lambda}\)). The delay is dependent on the distance from the transcriptional start site \(n\) and the transcription velocity \(v\). If the time after the Rifampicin additon is greater than the delay (\(delay = \frac{n}{v}\)) the exponential decay phase starts.

In addition to the standard model, we are using a second model which describes the behaviour at positions were the concentration increases after Rifampicin addition (Figure 1, right panel). This phenomenon can be explained by Rifampicin relievable transcription termination, e.g. through the transcriptional interference (TI) collision mechanism (Shearwin, Callen, and Egan 2005) or termination by short-lived factors such as sRNAs (Wang et al. 2015). In the following we will call this model the ‘TI model’ which consists of three phases:

\(c(t,n) = \begin{cases} \frac{\alpha - \alpha \times \beta}{\lambda} & \quad \text{if } t < \frac{n - n_{term}}{v}\\ \frac{\alpha}{\lambda} - \frac{\alpha \times \beta}{\lambda} \times e ^{-\lambda (t -\frac{n - n_{term}}{v})} & \quad \text{if } \frac{n - n_{term}}{v} < t < \frac{n}{v}\\ (\frac{\alpha}{\lambda} - \frac{\alpha \times \beta}{\lambda} \times e ^{-\lambda (t -\frac{n_{term}}{v})}) \times e^{-\lambda (t-\frac{n}{v})} & \quad \text{if } \frac{n}{v} \leq t \end{cases}\)

The first phase describes again the steady state concentration at a given transcript position, but here the synthesis rate \(\alpha\) is reduced by the TI-termination-factor \(\beta\). We assume a short lived factor responsible for the termination whose synthesis is stopped after rifampicin addition. Thus after the relieve of termination all polymerases that start at the transcriptional start site can reach positions downstream of the former termination site (\(n_{term}\)), the time polymerases need from the position of termination to the position \(n\) is delay for the increase (\(delay_{increase}= \frac{n - n_{term}}{v}\)). After the last polymerase has passed the respective position, the exponential decay phase starts.

‘rifi’ is a tool to do a stability analysis on high-throughput rifampicin data. RNA sequencing and microarray data derived from rifampicin treated bacteria with sufficiently high time resolution can reveal many insights into the mechanics of transcription, RNAP velocity and RNA stability. ‘rifi’ is a tool for the holistic identification of these transcription processes. The core part of the data analysis by rifi is the utilization of one of the two non linear regression models applied on the time series data of each probe (or bin), giving the probe/bin specific delay, decay constant \(\lambda\) and half-life (\(t_\frac{1}{2} = \frac{\ln(2)}{\lambda}\)) (Figure 1, left panel).

**Fit models**. Fits from both models. Left: the two-phase standard fit. Right the TI model fits the increase in intensity. Black dotes represent the average intensity for each timepoint, colored circles indicate the respective replicate.

Figure 1: Fit models
Fits from both models. Left: the two-phase standard fit. Right the TI model fits the increase in intensity. Black dotes represent the average intensity for each timepoint, colored circles indicate the respective replicate.

After the fit of the individual probes/bins, common worklfows usually combine the individual half-life values based on the given genome annotation to get an average for the gene based stability. This procedure can not deal with differences within a given gene, e.g. due to processing sites. ‘rifi’ uses an annotation agnostic approach to get an unbiased estimate of individual transcripts as they actually appear in vivo. probes/bins with equal properties in the extracted values delay, half-life, TI_termination_factor and the given intensity values are combined into segments by dynamic programming (called fragmentation in ‘rifi’), independent of an existing genome annotation (Figure 2). The fragmentation is performed hierarchically.
Initially segments of bins are grouped by regions without significant sequencing depth into position_segments. Those are grouped into delay_fragments by common velocity. Subsequently, each delay-fragment is grouped by similar half-life into half_life_fragments, on which the bins finally are grouped into intensity_fragments by similar intensity. From the fragmentation, many events can be extracted; iTSS (internal transcription start sites), transcription pausing_sites, velocity_changes,processing_sites, partial terminations, as well as instances of Rifampicin relievable transcription termination, e.g.  by TI (transcription interference). All data are integrated to give an estimate of continuous transcriptional units, i.e. operons. Comprehensive output tables and visualizations of the full genome result and the individual fits for all probes/bins are produced.